Keyrings Archive on lore.kernel.org
 help / color / Atom feed
* let import_iovec deal with compat_iovecs as well
@ 2020-09-18 12:45 Christoph Hellwig
  2020-09-18 12:45 ` [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag Christoph Hellwig
                   ` (11 more replies)
  0 siblings, 12 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-18 12:45 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, x86, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

Hi Al,

this series changes import_iovec to transparently deal with comat iovec
structures, and then cleanups up a lot of code dupliation.  But to get
there it first has to fix the pre-existing bug that io_uring compat
contexts don't trigger the in_compat_syscall() check.  This has so far
been relatively harmless as very little code callable from io_uring used
the check, and even that code that could be called usually wasn't.

Diffstat
 arch/arm64/include/asm/unistd32.h                  |   10 
 arch/mips/kernel/syscalls/syscall_n32.tbl          |   10 
 arch/mips/kernel/syscalls/syscall_o32.tbl          |   10 
 arch/parisc/kernel/syscalls/syscall.tbl            |   10 
 arch/powerpc/kernel/syscalls/syscall.tbl           |   10 
 arch/s390/kernel/syscalls/syscall.tbl              |   10 
 arch/sparc/include/asm/compat.h                    |    3 
 arch/sparc/kernel/syscalls/syscall.tbl             |   10 
 arch/x86/entry/syscall_x32.c                       |    5 
 arch/x86/entry/syscalls/syscall_32.tbl             |   10 
 arch/x86/entry/syscalls/syscall_64.tbl             |   10 
 arch/x86/include/asm/compat.h                      |    2 
 block/scsi_ioctl.c                                 |   12 
 drivers/scsi/sg.c                                  |    9 
 fs/aio.c                                           |   38 --
 fs/io_uring.c                                      |   21 -
 fs/read_write.c                                    |  307 ++++-----------------
 fs/splice.c                                        |   57 ---
 include/linux/compat.h                             |   29 -
 include/linux/fs.h                                 |    7 
 include/linux/sched.h                              |    1 
 include/linux/uio.h                                |    7 
 include/uapi/asm-generic/unistd.h                  |   12 
 lib/iov_iter.c                                     |   30 --
 mm/process_vm_access.c                             |   69 ----
 net/compat.c                                       |    4 
 security/keys/compat.c                             |   37 --
 security/keys/internal.h                           |    5 
 security/keys/keyctl.c                             |    2 
 tools/include/uapi/asm-generic/unistd.h            |   12 
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl |   10 
 tools/perf/arch/s390/entry/syscalls/syscall.tbl    |   10 
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  |   10 
 33 files changed, 207 insertions(+), 582 deletions(-)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-18 12:45 let import_iovec deal with compat_iovecs as well Christoph Hellwig
@ 2020-09-18 12:45 ` Christoph Hellwig
  2020-09-18 13:40   ` Al Viro
  2020-09-20 15:15   ` Matthew Wilcox
  2020-09-18 12:45 ` [PATCH 2/9] compat.h: fix a spelling error in <linux/compat.h> Christoph Hellwig
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-18 12:45 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, x86, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

Add a flag to force processing a syscall as a compat syscall.  This is
required so that in_compat_syscall() works for I/O submitted by io_uring
helper threads on behalf of compat syscalls.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/sparc/include/asm/compat.h | 3 ++-
 arch/x86/include/asm/compat.h   | 2 +-
 fs/io_uring.c                   | 9 +++++++++
 include/linux/compat.h          | 5 ++++-
 include/linux/sched.h           | 1 +
 5 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/sparc/include/asm/compat.h b/arch/sparc/include/asm/compat.h
index 40a267b3bd5208..fee6c51d36e869 100644
--- a/arch/sparc/include/asm/compat.h
+++ b/arch/sparc/include/asm/compat.h
@@ -211,7 +211,8 @@ static inline int is_compat_task(void)
 static inline bool in_compat_syscall(void)
 {
 	/* Vector 0x110 is LINUX_32BIT_SYSCALL_TRAP */
-	return pt_regs_trap_type(current_pt_regs()) = 0x110;
+	return pt_regs_trap_type(current_pt_regs()) = 0x110 ||
+		(current->flags & PF_FORCE_COMPAT);
 }
 #define in_compat_syscall in_compat_syscall
 #endif
diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index d4edf281fff49d..fbab072d4e5b31 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -208,7 +208,7 @@ static inline bool in_32bit_syscall(void)
 #ifdef CONFIG_COMPAT
 static inline bool in_compat_syscall(void)
 {
-	return in_32bit_syscall();
+	return in_32bit_syscall() || (current->flags & PF_FORCE_COMPAT);
 }
 #define in_compat_syscall in_compat_syscall	/* override the generic impl */
 #endif
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3790c7fe9fee22..5755d557c3f7bc 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5449,6 +5449,9 @@ static int io_req_defer_prep(struct io_kiocb *req,
 	if (unlikely(ret))
 		return ret;
 
+	if (req->ctx->compat)
+		current->flags |= PF_FORCE_COMPAT;
+
 	switch (req->opcode) {
 	case IORING_OP_NOP:
 		break;
@@ -5546,6 +5549,7 @@ static int io_req_defer_prep(struct io_kiocb *req,
 		break;
 	}
 
+	current->flags &= ~PF_FORCE_COMPAT;
 	return ret;
 }
 
@@ -5669,6 +5673,9 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	struct io_ring_ctx *ctx = req->ctx;
 	int ret;
 
+	if (ctx->compat)
+		current->flags |= PF_FORCE_COMPAT;
+
 	switch (req->opcode) {
 	case IORING_OP_NOP:
 		ret = io_nop(req, cs);
@@ -5898,6 +5905,8 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 		break;
 	}
 
+	current->flags &= ~PF_FORCE_COMPAT;
+
 	if (ret)
 		return ret;
 
diff --git a/include/linux/compat.h b/include/linux/compat.h
index b354ce58966e2d..685066f7ad325f 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -891,7 +891,10 @@ asmlinkage long compat_sys_socketcall(int call, u32 __user *args);
  */
 
 #ifndef in_compat_syscall
-static inline bool in_compat_syscall(void) { return is_compat_task(); }
+static inline bool in_compat_syscall(void)
+{
+	return is_compat_task() || (current->flags & PF_FORCE_COMPAT);
+}
 #endif
 
 /**
diff --git a/include/linux/sched.h b/include/linux/sched.h
index afe01e232935fa..c8b183b5655a1e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1491,6 +1491,7 @@ extern struct pid *cad_pid;
  */
 #define PF_IDLE			0x00000002	/* I am an IDLE thread */
 #define PF_EXITING		0x00000004	/* Getting shut down */
+#define PF_FORCE_COMPAT		0x00000008	/* acting as compat task */
 #define PF_VCPU			0x00000010	/* I'm a virtual CPU */
 #define PF_WQ_WORKER		0x00000020	/* I'm a workqueue worker */
 #define PF_FORKNOEXEC		0x00000040	/* Forked but didn't exec */
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 2/9] compat.h: fix a spelling error in <linux/compat.h>
  2020-09-18 12:45 let import_iovec deal with compat_iovecs as well Christoph Hellwig
  2020-09-18 12:45 ` [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag Christoph Hellwig
@ 2020-09-18 12:45 ` Christoph Hellwig
  2020-09-18 13:37   ` Johannes Thumshirn
  2020-09-18 12:45 ` [PATCH 3/9] fs: explicitly check for CHECK_IOVEC_ONLY in rw_copy_check_uvector Christoph Hellwig
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-18 12:45 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, x86, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

We only have not compat_sys_readv64v2 syscall, only a
compat_sys_preadv64v2 syscall one.  This probably worked given that the
syscall was not referenced from anywhere but the x86 syscall table.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/compat.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index 685066f7ad325f..69968c124b3cad 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -812,7 +812,7 @@ asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
 		const struct compat_iovec __user *vec,
 		compat_ulong_t vlen, u32 pos_low, u32 pos_high, rwf_t flags);
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
-asmlinkage long  compat_sys_readv64v2(unsigned long fd,
+asmlinkage long  compat_sys_preadv64v2(unsigned long fd,
 		const struct compat_iovec __user *vec,
 		unsigned long vlen, loff_t pos, rwf_t flags);
 #endif
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 3/9] fs: explicitly check for CHECK_IOVEC_ONLY in rw_copy_check_uvector
  2020-09-18 12:45 let import_iovec deal with compat_iovecs as well Christoph Hellwig
  2020-09-18 12:45 ` [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag Christoph Hellwig
  2020-09-18 12:45 ` [PATCH 2/9] compat.h: fix a spelling error in <linux/compat.h> Christoph Hellwig
@ 2020-09-18 12:45 ` Christoph Hellwig
  2020-09-18 12:56   ` Matthew Wilcox
  2020-09-18 13:39   ` Johannes Thumshirn
  2020-09-18 12:45 ` [PATCH 4/9] fs: handle the compat case in import_iovec Christoph Hellwig
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-18 12:45 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, x86, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

Explicitly check for the magic value insted of implicitly relying on
its number representation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 5db58b8c78d0dd..f153116bc5399b 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -840,8 +840,7 @@ ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
 			ret = -EINVAL;
 			goto out;
 		}
-		if (type >= 0
-		    && unlikely(!access_ok(buf, len))) {
+		if (type != CHECK_IOVEC_ONLY && unlikely(!access_ok(buf, len))) {
 			ret = -EFAULT;
 			goto out;
 		}
@@ -911,7 +910,7 @@ ssize_t compat_rw_copy_check_uvector(int type,
 		}
 		if (len < 0)	/* size_t not fitting in compat_ssize_t .. */
 			goto out;
-		if (type >= 0 &&
+		if (type != CHECK_IOVEC_ONLY &&
 		    !access_ok(compat_ptr(buf), len)) {
 			ret = -EFAULT;
 			goto out;
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 4/9] fs: handle the compat case in import_iovec
  2020-09-18 12:45 let import_iovec deal with compat_iovecs as well Christoph Hellwig
                   ` (2 preceding siblings ...)
  2020-09-18 12:45 ` [PATCH 3/9] fs: explicitly check for CHECK_IOVEC_ONLY in rw_copy_check_uvector Christoph Hellwig
@ 2020-09-18 12:45 ` Christoph Hellwig
  2020-09-18 12:45 ` [PATCH 5/9] fs: remove various compat readv/writev helpers Christoph Hellwig
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-18 12:45 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, x86, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

Use in compat_syscall to import either native or the compat iovecs, and
remove the now superflous compat_import_iovec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/scsi_ioctl.c     |  12 +---
 drivers/scsi/sg.c      |   9 +--
 fs/aio.c               |  38 +++++-------
 fs/io_uring.c          |  12 +---
 fs/read_write.c        | 127 +++++++++++++++--------------------------
 fs/splice.c            |   2 +-
 include/linux/compat.h |   6 --
 include/linux/fs.h     |   7 +--
 include/linux/uio.h    |   7 ---
 lib/iov_iter.c         |  30 +---------
 mm/process_vm_access.c |   9 +--
 net/compat.c           |   4 +-
 security/keys/compat.c |   5 +-
 13 files changed, 83 insertions(+), 185 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index ef722f04f88a93..e08df86866ee5d 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -333,16 +333,8 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
 		struct iov_iter i;
 		struct iovec *iov = NULL;
 
-#ifdef CONFIG_COMPAT
-		if (in_compat_syscall())
-			ret = compat_import_iovec(rq_data_dir(rq),
-				   hdr->dxferp, hdr->iovec_count,
-				   0, &iov, &i);
-		else
-#endif
-			ret = import_iovec(rq_data_dir(rq),
-				   hdr->dxferp, hdr->iovec_count,
-				   0, &iov, &i);
+		ret = import_iovec(rq_data_dir(rq), hdr->dxferp,
+				   hdr->iovec_count, 0, &iov, &i);
 		if (ret < 0)
 			goto out_free_cdb;
 
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 20472aaaf630a4..bfa8d77322d732 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1820,14 +1820,7 @@ sg_start_req(Sg_request *srp, unsigned char *cmd)
 		struct iovec *iov = NULL;
 		struct iov_iter i;
 
-#ifdef CONFIG_COMPAT
-		if (in_compat_syscall())
-			res = compat_import_iovec(rw, hp->dxferp, iov_count,
-						  0, &iov, &i);
-		else
-#endif
-			res = import_iovec(rw, hp->dxferp, iov_count,
-					   0, &iov, &i);
+		res = import_iovec(rw, hp->dxferp, iov_count, 0, &iov, &i);
 		if (res < 0)
 			return res;
 
diff --git a/fs/aio.c b/fs/aio.c
index d5ec303855669d..b377f5c2048e18 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1478,8 +1478,7 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb)
 }
 
 static ssize_t aio_setup_rw(int rw, const struct iocb *iocb,
-		struct iovec **iovec, bool vectored, bool compat,
-		struct iov_iter *iter)
+		struct iovec **iovec, bool vectored, struct iov_iter *iter)
 {
 	void __user *buf = (void __user *)(uintptr_t)iocb->aio_buf;
 	size_t len = iocb->aio_nbytes;
@@ -1489,11 +1488,6 @@ static ssize_t aio_setup_rw(int rw, const struct iocb *iocb,
 		*iovec = NULL;
 		return ret;
 	}
-#ifdef CONFIG_COMPAT
-	if (compat)
-		return compat_import_iovec(rw, buf, len, UIO_FASTIOV, iovec,
-				iter);
-#endif
 	return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter);
 }
 
@@ -1517,8 +1511,7 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret)
 	}
 }
 
-static int aio_read(struct kiocb *req, const struct iocb *iocb,
-			bool vectored, bool compat)
+static int aio_read(struct kiocb *req, const struct iocb *iocb, bool vectored)
 {
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct iov_iter iter;
@@ -1535,7 +1528,7 @@ static int aio_read(struct kiocb *req, const struct iocb *iocb,
 	if (unlikely(!file->f_op->read_iter))
 		return -EINVAL;
 
-	ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter);
+	ret = aio_setup_rw(READ, iocb, &iovec, vectored, &iter);
 	if (ret < 0)
 		return ret;
 	ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter));
@@ -1545,8 +1538,7 @@ static int aio_read(struct kiocb *req, const struct iocb *iocb,
 	return ret;
 }
 
-static int aio_write(struct kiocb *req, const struct iocb *iocb,
-			 bool vectored, bool compat)
+static int aio_write(struct kiocb *req, const struct iocb *iocb, bool vectored)
 {
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct iov_iter iter;
@@ -1563,7 +1555,7 @@ static int aio_write(struct kiocb *req, const struct iocb *iocb,
 	if (unlikely(!file->f_op->write_iter))
 		return -EINVAL;
 
-	ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter);
+	ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, &iter);
 	if (ret < 0)
 		return ret;
 	ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
@@ -1799,8 +1791,7 @@ static int aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 }
 
 static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
-			   struct iocb __user *user_iocb, struct aio_kiocb *req,
-			   bool compat)
+			   struct iocb __user *user_iocb, struct aio_kiocb *req)
 {
 	req->ki_filp = fget(iocb->aio_fildes);
 	if (unlikely(!req->ki_filp))
@@ -1833,13 +1824,13 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 
 	switch (iocb->aio_lio_opcode) {
 	case IOCB_CMD_PREAD:
-		return aio_read(&req->rw, iocb, false, compat);
+		return aio_read(&req->rw, iocb, false);
 	case IOCB_CMD_PWRITE:
-		return aio_write(&req->rw, iocb, false, compat);
+		return aio_write(&req->rw, iocb, false);
 	case IOCB_CMD_PREADV:
-		return aio_read(&req->rw, iocb, true, compat);
+		return aio_read(&req->rw, iocb, true);
 	case IOCB_CMD_PWRITEV:
-		return aio_write(&req->rw, iocb, true, compat);
+		return aio_write(&req->rw, iocb, true);
 	case IOCB_CMD_FSYNC:
 		return aio_fsync(&req->fsync, iocb, false);
 	case IOCB_CMD_FDSYNC:
@@ -1852,8 +1843,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 	}
 }
 
-static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
-			 bool compat)
+static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb)
 {
 	struct aio_kiocb *req;
 	struct iocb iocb;
@@ -1882,7 +1872,7 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	if (unlikely(!req))
 		return -EAGAIN;
 
-	err = __io_submit_one(ctx, &iocb, user_iocb, req, compat);
+	err = __io_submit_one(ctx, &iocb, user_iocb, req);
 
 	/* Done with the synchronous reference */
 	iocb_put(req);
@@ -1941,7 +1931,7 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
 			break;
 		}
 
-		ret = io_submit_one(ctx, user_iocb, false);
+		ret = io_submit_one(ctx, user_iocb);
 		if (ret)
 			break;
 	}
@@ -1983,7 +1973,7 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id,
 			break;
 		}
 
-		ret = io_submit_one(ctx, compat_ptr(user_iocb), true);
+		ret = io_submit_one(ctx, compat_ptr(user_iocb));
 		if (ret)
 			break;
 	}
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 5755d557c3f7bc..dc888f911f04b4 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2837,12 +2837,6 @@ static ssize_t __io_import_iovec(int rw, struct io_kiocb *req,
 		return ret;
 	}
 
-#ifdef CONFIG_COMPAT
-	if (req->ctx->compat)
-		return compat_import_iovec(rw, buf, sqe_len, UIO_FASTIOV,
-						iovec, iter);
-#endif
-
 	return import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter);
 }
 
@@ -4220,9 +4214,9 @@ static int __io_compat_recvmsg_copy_hdr(struct io_kiocb *req,
 		sr->len = iomsg->iov[0].iov_len;
 		iomsg->iov = NULL;
 	} else {
-		ret = compat_import_iovec(READ, uiov, len, UIO_FASTIOV,
-						&iomsg->iov,
-						&iomsg->msg.msg_iter);
+		ret = import_iovec(READ, (struct iovec __user *)uiov, len,
+				   UIO_FASTIOV, &iomsg->iov,
+				   &iomsg->msg.msg_iter);
 		if (ret < 0)
 			return ret;
 	}
diff --git a/fs/read_write.c b/fs/read_write.c
index f153116bc5399b..2f961c653ce561 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -752,6 +752,38 @@ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
 	return ret;
 }
 
+static int compat_copy_iovecs_from_user(struct iovec *iov,
+		const struct iovec __user *uvector, unsigned long nr_segs)
+{
+	const struct compat_iovec __user *uiov +		(const struct compat_iovec __user *)uvector;
+	unsigned long i;
+	int ret = -EFAULT;
+
+	if (!user_access_begin(uvector, nr_segs * sizeof(*uvector)))
+		return -EFAULT;
+
+	for (i = 0; i < nr_segs; i++) {
+		compat_uptr_t buf;
+		compat_ssize_t len;
+
+		unsafe_get_user(len, &uiov[i].iov_len, out);
+		unsafe_get_user(buf, &uiov[i].iov_base, out);
+
+		/* check for compat_size_t not fitting in compat_ssize_t .. */
+		if (len < 0) {
+			ret = -EINVAL;
+			goto out;
+		}
+		iov[i].iov_base = compat_ptr(buf);
+		iov[i].iov_len = len;
+	}
+	ret = 0;
+out:
+	user_access_end();
+	return ret;
+}
+
 /**
  * rw_copy_check_uvector() - Copy an array of &struct iovec from userspace
  *     into the kernel and check that it is valid.
@@ -808,6 +840,7 @@ ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
 		ret = -EINVAL;
 		goto out;
 	}
+
 	if (nr_segs > fast_segs) {
 		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
 		if (iov = NULL) {
@@ -815,9 +848,16 @@ ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
 			goto out;
 		}
 	}
-	if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
-		ret = -EFAULT;
-		goto out;
+
+	if (in_compat_syscall()) {
+		ret = compat_copy_iovecs_from_user(iov, uvector, nr_segs);
+		if (ret)
+			goto out;
+	} else {
+		if (copy_from_user(iov, uvector, nr_segs * sizeof(*uvector))) {
+			ret = -EFAULT;
+			goto out;
+		}
 	}
 
 	/*
@@ -855,81 +895,6 @@ ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
 	return ret;
 }
 
-#ifdef CONFIG_COMPAT
-ssize_t compat_rw_copy_check_uvector(int type,
-		const struct compat_iovec __user *uvector, unsigned long nr_segs,
-		unsigned long fast_segs, struct iovec *fast_pointer,
-		struct iovec **ret_pointer)
-{
-	compat_ssize_t tot_len;
-	struct iovec *iov = *ret_pointer = fast_pointer;
-	ssize_t ret = 0;
-	int seg;
-
-	/*
-	 * SuS says "The readv() function *may* fail if the iovcnt argument
-	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-	 * traditionally returned zero for zero segments, so...
-	 */
-	if (nr_segs = 0)
-		goto out;
-
-	ret = -EINVAL;
-	if (nr_segs > UIO_MAXIOV)
-		goto out;
-	if (nr_segs > fast_segs) {
-		ret = -ENOMEM;
-		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-		if (iov = NULL)
-			goto out;
-	}
-	*ret_pointer = iov;
-
-	ret = -EFAULT;
-	if (!access_ok(uvector, nr_segs*sizeof(*uvector)))
-		goto out;
-
-	/*
-	 * Single unix specification:
-	 * We should -EINVAL if an element length is not >= 0 and fitting an
-	 * ssize_t.
-	 *
-	 * In Linux, the total length is limited to MAX_RW_COUNT, there is
-	 * no overflow possibility.
-	 */
-	tot_len = 0;
-	ret = -EINVAL;
-	for (seg = 0; seg < nr_segs; seg++) {
-		compat_uptr_t buf;
-		compat_ssize_t len;
-
-		if (__get_user(len, &uvector->iov_len) ||
-		   __get_user(buf, &uvector->iov_base)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len < 0)	/* size_t not fitting in compat_ssize_t .. */
-			goto out;
-		if (type != CHECK_IOVEC_ONLY &&
-		    !access_ok(compat_ptr(buf), len)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len > MAX_RW_COUNT - tot_len)
-			len = MAX_RW_COUNT - tot_len;
-		tot_len += len;
-		iov->iov_base = compat_ptr(buf);
-		iov->iov_len = (compat_size_t) len;
-		uvector++;
-		iov++;
-	}
-	ret = tot_len;
-
-out:
-	return ret;
-}
-#endif
-
 static ssize_t do_iter_read(struct file *file, struct iov_iter *iter,
 		loff_t *pos, rwf_t flags)
 {
@@ -1256,7 +1221,8 @@ static size_t compat_readv(struct file *file,
 	struct iov_iter iter;
 	ssize_t ret;
 
-	ret = compat_import_iovec(READ, vec, vlen, UIO_FASTIOV, &iov, &iter);
+	ret = import_iovec(READ, (const struct iovec __user *)vec, vlen,
+			   UIO_FASTIOV, &iov, &iter);
 	if (ret >= 0) {
 		ret = do_iter_read(file, &iter, pos, flags);
 		kfree(iov);
@@ -1364,7 +1330,8 @@ static size_t compat_writev(struct file *file,
 	struct iov_iter iter;
 	ssize_t ret;
 
-	ret = compat_import_iovec(WRITE, vec, vlen, UIO_FASTIOV, &iov, &iter);
+	ret = import_iovec(WRITE, (const struct iovec __user *)vec, vlen,
+			   UIO_FASTIOV, &iov, &iter);
 	if (ret >= 0) {
 		file_start_write(file);
 		ret = do_iter_write(file, &iter, pos, flags);
diff --git a/fs/splice.c b/fs/splice.c
index d7c8a7c4db07ff..132d42b9871f9b 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1387,7 +1387,7 @@ COMPAT_SYSCALL_DEFINE4(vmsplice, int, fd, const struct compat_iovec __user *, io
 	if (error)
 		return error;
 
-	error = compat_import_iovec(type, iov32, nr_segs,
+	error = import_iovec(type, (struct iovec __user *)iov32, nr_segs,
 			     ARRAY_SIZE(iovstack), &iov, &iter);
 	if (error >= 0) {
 		error = do_vmsplice(f.file, &iter, flags);
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 69968c124b3cad..ad6dc56e8828d6 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -451,12 +451,6 @@ extern long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
 
 struct epoll_event;	/* fortunately, this one is fixed-layout */
 
-extern ssize_t compat_rw_copy_check_uvector(int type,
-		const struct compat_iovec __user *uvector,
-		unsigned long nr_segs,
-		unsigned long fast_segs, struct iovec *fast_pointer,
-		struct iovec **ret_pointer);
-
 extern void __user *compat_alloc_user_space(unsigned long len);
 
 int compat_restore_altstack(const compat_stack_t __user *uss);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7519ae003a082c..3cc0ee0de45648 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -179,10 +179,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 #define FMODE_BUF_RASYNC	((__force fmode_t)0x40000000)
 
 /*
- * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
- * that indicates that they should check the contents of the iovec are
- * valid, but not check the memory that the iovec elements
- * points too.
+ * Flag for rw_copy_check_uvector  that indicates that they should check the
+ * contents of the iovec are valid, but not check the memory that the iovec
+ * elements points too.
  */
 #define CHECK_IOVEC_ONLY -1
 
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 3835a8a8e9eae0..2c14e55687fec6 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -270,13 +270,6 @@ ssize_t import_iovec(int type, const struct iovec __user * uvector,
 		 unsigned nr_segs, unsigned fast_segs,
 		 struct iovec **iov, struct iov_iter *i);
 
-#ifdef CONFIG_COMPAT
-struct compat_iovec;
-ssize_t compat_import_iovec(int type, const struct compat_iovec __user * uvector,
-		 unsigned nr_segs, unsigned fast_segs,
-		 struct iovec **iov, struct iov_iter *i);
-#endif
-
 int import_single_range(int type, void __user *buf, size_t len,
 		 struct iovec *iov, struct iov_iter *i);
 
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 5e40786c8f1232..792f31c1cd96ba 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -10,6 +10,7 @@
 #include <net/checksum.h>
 #include <linux/scatterlist.h>
 #include <linux/instrumented.h>
+#include <linux/compat.h>
 
 #define PIPE_PARANOIA /* for now */
 
@@ -1678,32 +1679,8 @@ ssize_t import_iovec(int type, const struct iovec __user * uvector,
 {
 	ssize_t n;
 	struct iovec *p;
-	n = rw_copy_check_uvector(type, uvector, nr_segs, fast_segs,
-				  *iov, &p);
-	if (n < 0) {
-		if (p != *iov)
-			kfree(p);
-		*iov = NULL;
-		return n;
-	}
-	iov_iter_init(i, type, p, nr_segs, n);
-	*iov = p = *iov ? NULL : p;
-	return n;
-}
-EXPORT_SYMBOL(import_iovec);
 
-#ifdef CONFIG_COMPAT
-#include <linux/compat.h>
-
-ssize_t compat_import_iovec(int type,
-		const struct compat_iovec __user * uvector,
-		unsigned nr_segs, unsigned fast_segs,
-		struct iovec **iov, struct iov_iter *i)
-{
-	ssize_t n;
-	struct iovec *p;
-	n = compat_rw_copy_check_uvector(type, uvector, nr_segs, fast_segs,
-				  *iov, &p);
+	n = rw_copy_check_uvector(type, uvector, nr_segs, fast_segs, *iov, &p);
 	if (n < 0) {
 		if (p != *iov)
 			kfree(p);
@@ -1714,8 +1691,7 @@ ssize_t compat_import_iovec(int type,
 	*iov = p = *iov ? NULL : p;
 	return n;
 }
-EXPORT_SYMBOL(compat_import_iovec);
-#endif
+EXPORT_SYMBOL(import_iovec);
 
 int import_single_range(int rw, void __user *buf, size_t len,
 		 struct iovec *iov, struct iov_iter *i)
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 29c052099affdc..f21feebbd48f39 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -329,14 +329,15 @@ compat_process_vm_rw(compat_pid_t pid,
 	if (flags != 0)
 		return -EINVAL;
 
-	rc = compat_import_iovec(dir, lvec, liovcnt, UIO_FASTIOV, &iov_l, &iter);
+	rc = import_iovec(dir, (const struct iovec __user *)lvec, liovcnt,
+			  UIO_FASTIOV, &iov_l, &iter);
 	if (rc < 0)
 		return rc;
 	if (!iov_iter_count(&iter))
 		goto free_iovecs;
-	rc = compat_rw_copy_check_uvector(CHECK_IOVEC_ONLY, rvec, riovcnt,
-					  UIO_FASTIOV, iovstack_r,
-					  &iov_r);
+	rc = rw_copy_check_uvector(CHECK_IOVEC_ONLY,
+				   (const struct iovec __user *)rvec, riovcnt,
+				   UIO_FASTIOV, iovstack_r, &iov_r);
 	if (rc <= 0)
 		goto free_iovecs;
 
diff --git a/net/compat.c b/net/compat.c
index 95ce707a30a31d..ddd15af3a2837b 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -98,8 +98,8 @@ int get_compat_msghdr(struct msghdr *kmsg,
 	if (err)
 		return err;
 
-	err = compat_import_iovec(save_addr ? READ : WRITE, compat_ptr(ptr),
-				   len, UIO_FASTIOV, iov, &kmsg->msg_iter);
+	err = import_iovec(save_addr ? READ : WRITE, compat_ptr(ptr), len,
+			   UIO_FASTIOV, iov, &kmsg->msg_iter);
 	return err < 0 ? err : 0;
 }
 
diff --git a/security/keys/compat.c b/security/keys/compat.c
index 6ee9d8f6a4a5bb..7ae531db031cf8 100644
--- a/security/keys/compat.c
+++ b/security/keys/compat.c
@@ -33,9 +33,8 @@ static long compat_keyctl_instantiate_key_iov(
 	if (!_payload_iov)
 		ioc = 0;
 
-	ret = compat_import_iovec(WRITE, _payload_iov, ioc,
-				  ARRAY_SIZE(iovstack), &iov,
-				  &from);
+	ret = import_iovec(WRITE, (const struct iovec __user *)_payload_iov,
+			   ioc, ARRAY_SIZE(iovstack), &iov, &from);
 	if (ret < 0)
 		return ret;
 
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 5/9] fs: remove various compat readv/writev helpers
  2020-09-18 12:45 let import_iovec deal with compat_iovecs as well Christoph Hellwig
                   ` (3 preceding siblings ...)
  2020-09-18 12:45 ` [PATCH 4/9] fs: handle the compat case in import_iovec Christoph Hellwig
@ 2020-09-18 12:45 ` Christoph Hellwig
  2020-09-18 12:45 ` [PATCH 6/9] fs: remove the compat readv/writev syscalls Christoph Hellwig
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-18 12:45 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, x86, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

Now that import_iovec handles compat iovecs as well, all the duplicated
code in the compat readv/writev helpers is not needed.  Remove them
and switch the compat syscall handlers to use the native helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 179 ++++++++----------------------------------------
 1 file changed, 30 insertions(+), 149 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 2f961c653ce561..9eb63c53da78f2 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1211,226 +1211,107 @@ SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
 	return do_pwritev(fd, vec, vlen, pos, flags);
 }
 
+/*
+ * Various compat syscalls.  Note that they all pretend to take a native
+ * iovec - import_iovec will properly treat those as compat_iovecs based on
+ * in_compat_syscall().
+ */
 #ifdef CONFIG_COMPAT
-static size_t compat_readv(struct file *file,
-			   const struct compat_iovec __user *vec,
-			   unsigned long vlen, loff_t *pos, rwf_t flags)
-{
-	struct iovec iovstack[UIO_FASTIOV];
-	struct iovec *iov = iovstack;
-	struct iov_iter iter;
-	ssize_t ret;
-
-	ret = import_iovec(READ, (const struct iovec __user *)vec, vlen,
-			   UIO_FASTIOV, &iov, &iter);
-	if (ret >= 0) {
-		ret = do_iter_read(file, &iter, pos, flags);
-		kfree(iov);
-	}
-	if (ret > 0)
-		add_rchar(current, ret);
-	inc_syscr(current);
-	return ret;
-}
-
-static size_t do_compat_readv(compat_ulong_t fd,
-				 const struct compat_iovec __user *vec,
-				 compat_ulong_t vlen, rwf_t flags)
-{
-	struct fd f = fdget_pos(fd);
-	ssize_t ret;
-	loff_t pos;
-
-	if (!f.file)
-		return -EBADF;
-	pos = f.file->f_pos;
-	ret = compat_readv(f.file, vec, vlen, &pos, flags);
-	if (ret >= 0)
-		f.file->f_pos = pos;
-	fdput_pos(f);
-	return ret;
-
-}
-
 COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		compat_ulong_t, vlen)
 {
-	return do_compat_readv(fd, vec, vlen, 0);
-}
-
-static long do_compat_preadv64(unsigned long fd,
-				  const struct compat_iovec __user *vec,
-				  unsigned long vlen, loff_t pos, rwf_t flags)
-{
-	struct fd f;
-	ssize_t ret;
-
-	if (pos < 0)
-		return -EINVAL;
-	f = fdget(fd);
-	if (!f.file)
-		return -EBADF;
-	ret = -ESPIPE;
-	if (f.file->f_mode & FMODE_PREAD)
-		ret = compat_readv(f.file, vec, vlen, &pos, flags);
-	fdput(f);
-	return ret;
+	return do_readv(fd, vec, vlen, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
 COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		unsigned long, vlen, loff_t, pos)
 {
-	return do_compat_preadv64(fd, vec, vlen, pos, 0);
+	return do_preadv(fd, vec, vlen, pos, 0);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		compat_ulong_t, vlen, u32, pos_low, u32, pos_high)
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-	return do_compat_preadv64(fd, vec, vlen, pos, 0);
+	return do_preadv(fd, vec, vlen, pos, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
 COMPAT_SYSCALL_DEFINE5(preadv64v2, unsigned long, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		unsigned long, vlen, loff_t, pos, rwf_t, flags)
 {
 	if (pos = -1)
-		return do_compat_readv(fd, vec, vlen, flags);
-
-	return do_compat_preadv64(fd, vec, vlen, pos, flags);
+		return do_readv(fd, vec, vlen, flags);
+	return do_preadv(fd, vec, vlen, pos, flags);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		compat_ulong_t, vlen, u32, pos_low, u32, pos_high,
 		rwf_t, flags)
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
 	if (pos = -1)
-		return do_compat_readv(fd, vec, vlen, flags);
-
-	return do_compat_preadv64(fd, vec, vlen, pos, flags);
-}
-
-static size_t compat_writev(struct file *file,
-			    const struct compat_iovec __user *vec,
-			    unsigned long vlen, loff_t *pos, rwf_t flags)
-{
-	struct iovec iovstack[UIO_FASTIOV];
-	struct iovec *iov = iovstack;
-	struct iov_iter iter;
-	ssize_t ret;
-
-	ret = import_iovec(WRITE, (const struct iovec __user *)vec, vlen,
-			   UIO_FASTIOV, &iov, &iter);
-	if (ret >= 0) {
-		file_start_write(file);
-		ret = do_iter_write(file, &iter, pos, flags);
-		file_end_write(file);
-		kfree(iov);
-	}
-	if (ret > 0)
-		add_wchar(current, ret);
-	inc_syscw(current);
-	return ret;
-}
-
-static size_t do_compat_writev(compat_ulong_t fd,
-				  const struct compat_iovec __user* vec,
-				  compat_ulong_t vlen, rwf_t flags)
-{
-	struct fd f = fdget_pos(fd);
-	ssize_t ret;
-	loff_t pos;
-
-	if (!f.file)
-		return -EBADF;
-	pos = f.file->f_pos;
-	ret = compat_writev(f.file, vec, vlen, &pos, flags);
-	if (ret >= 0)
-		f.file->f_pos = pos;
-	fdput_pos(f);
-	return ret;
+		return do_readv(fd, vec, vlen, flags);
+	return do_preadv(fd, vec, vlen, pos, flags);
 }
 
 COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
-		const struct compat_iovec __user *, vec,
+		const struct iovec __user *, vec,
 		compat_ulong_t, vlen)
 {
-	return do_compat_writev(fd, vec, vlen, 0);
-}
-
-static long do_compat_pwritev64(unsigned long fd,
-				   const struct compat_iovec __user *vec,
-				   unsigned long vlen, loff_t pos, rwf_t flags)
-{
-	struct fd f;
-	ssize_t ret;
-
-	if (pos < 0)
-		return -EINVAL;
-	f = fdget(fd);
-	if (!f.file)
-		return -EBADF;
-	ret = -ESPIPE;
-	if (f.file->f_mode & FMODE_PWRITE)
-		ret = compat_writev(f.file, vec, vlen, &pos, flags);
-	fdput(f);
-	return ret;
+	return do_writev(fd, vec, vlen, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PWRITEV64
 COMPAT_SYSCALL_DEFINE4(pwritev64, unsigned long, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		unsigned long, vlen, loff_t, pos)
 {
-	return do_compat_pwritev64(fd, vec, vlen, pos, 0);
+	return do_pwritev(fd, vec, vlen, pos, 0);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE5(pwritev, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *,vec,
 		compat_ulong_t, vlen, u32, pos_low, u32, pos_high)
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-	return do_compat_pwritev64(fd, vec, vlen, pos, 0);
+	return do_pwritev(fd, vec, vlen, pos, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PWRITEV64V2
 COMPAT_SYSCALL_DEFINE5(pwritev64v2, unsigned long, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		unsigned long, vlen, loff_t, pos, rwf_t, flags)
 {
 	if (pos = -1)
-		return do_compat_writev(fd, vec, vlen, flags);
-
-	return do_compat_pwritev64(fd, vec, vlen, pos, flags);
+		return do_writev(fd, vec, vlen, flags);
+	return do_pwritev(fd, vec, vlen, pos, flags);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE6(pwritev2, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *,vec,
 		compat_ulong_t, vlen, u32, pos_low, u32, pos_high, rwf_t, flags)
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
 	if (pos = -1)
-		return do_compat_writev(fd, vec, vlen, flags);
-
-	return do_compat_pwritev64(fd, vec, vlen, pos, flags);
+		return do_writev(fd, vec, vlen, flags);
+	return do_pwritev(fd, vec, vlen, pos, flags);
 }
-
-#endif
+#endif /* CONFIG_COMPAT */
 
 static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 		  	   size_t count, loff_t max)
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 6/9] fs: remove the compat readv/writev syscalls
  2020-09-18 12:45 let import_iovec deal with compat_iovecs as well Christoph Hellwig
                   ` (4 preceding siblings ...)
  2020-09-18 12:45 ` [PATCH 5/9] fs: remove various compat readv/writev helpers Christoph Hellwig
@ 2020-09-18 12:45 ` Christoph Hellwig
  2020-09-18 12:45 ` [PATCH 7/9] fs: remove compat_sys_vmsplice Christoph Hellwig
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-18 12:45 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, x86, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

Now that import_iovec handles compat iovecs, the native readv and writev
syscalls can be used for the compat case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/include/asm/unistd32.h                  |  4 ++--
 arch/mips/kernel/syscalls/syscall_n32.tbl          |  4 ++--
 arch/mips/kernel/syscalls/syscall_o32.tbl          |  4 ++--
 arch/parisc/kernel/syscalls/syscall.tbl            |  4 ++--
 arch/powerpc/kernel/syscalls/syscall.tbl           |  4 ++--
 arch/s390/kernel/syscalls/syscall.tbl              |  4 ++--
 arch/sparc/kernel/syscalls/syscall.tbl             |  4 ++--
 arch/x86/entry/syscall_x32.c                       |  2 ++
 arch/x86/entry/syscalls/syscall_32.tbl             |  4 ++--
 arch/x86/entry/syscalls/syscall_64.tbl             |  4 ++--
 fs/read_write.c                                    | 14 --------------
 include/linux/compat.h                             |  4 ----
 include/uapi/asm-generic/unistd.h                  |  4 ++--
 tools/include/uapi/asm-generic/unistd.h            |  4 ++--
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl |  4 ++--
 tools/perf/arch/s390/entry/syscalls/syscall.tbl    |  4 ++--
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  |  4 ++--
 17 files changed, 30 insertions(+), 46 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 734860ac7cf9d5..4a236493dca5b9 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -301,9 +301,9 @@ __SYSCALL(__NR_flock, sys_flock)
 #define __NR_msync 144
 __SYSCALL(__NR_msync, sys_msync)
 #define __NR_readv 145
-__SYSCALL(__NR_readv, compat_sys_readv)
+__SYSCALL(__NR_readv, sys_readv)
 #define __NR_writev 146
-__SYSCALL(__NR_writev, compat_sys_writev)
+__SYSCALL(__NR_writev, sys_writev)
 #define __NR_getsid 147
 __SYSCALL(__NR_getsid, sys_getsid)
 #define __NR_fdatasync 148
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index f9df9edb67a407..c99a92646f8ee9 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -25,8 +25,8 @@
 15	n32	ioctl				compat_sys_ioctl
 16	n32	pread64				sys_pread64
 17	n32	pwrite64			sys_pwrite64
-18	n32	readv				compat_sys_readv
-19	n32	writev				compat_sys_writev
+18	n32	readv				sys_readv
+19	n32	writev				sys_writev
 20	n32	access				sys_access
 21	n32	pipe				sysm_pipe
 22	n32	_newselect			compat_sys_select
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 195b43cf27c848..075064d10661bf 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -156,8 +156,8 @@
 142	o32	_newselect			sys_select			compat_sys_select
 143	o32	flock				sys_flock
 144	o32	msync				sys_msync
-145	o32	readv				sys_readv			compat_sys_readv
-146	o32	writev				sys_writev			compat_sys_writev
+145	o32	readv				sys_readv
+146	o32	writev				sys_writev
 147	o32	cacheflush			sys_cacheflush
 148	o32	cachectl			sys_cachectl
 149	o32	sysmips				__sys_sysmips
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index def64d221cd4fb..192abde0001d9d 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -159,8 +159,8 @@
 142	common	_newselect		sys_select			compat_sys_select
 143	common	flock			sys_flock
 144	common	msync			sys_msync
-145	common	readv			sys_readv			compat_sys_readv
-146	common	writev			sys_writev			compat_sys_writev
+145	common	readv			sys_readv
+146	common	writev			sys_writev
 147	common	getsid			sys_getsid
 148	common	fdatasync		sys_fdatasync
 149	common	_sysctl			sys_ni_syscall
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index c2d737ff2e7bec..6f1e2ecf0edad9 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -193,8 +193,8 @@
 142	common	_newselect			sys_select			compat_sys_select
 143	common	flock				sys_flock
 144	common	msync				sys_msync
-145	common	readv				sys_readv			compat_sys_readv
-146	common	writev				sys_writev			compat_sys_writev
+145	common	readv				sys_readv
+146	common	writev				sys_writev
 147	common	getsid				sys_getsid
 148	common	fdatasync			sys_fdatasync
 149	nospu	_sysctl				sys_ni_syscall
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 10456bc936fb09..6101cf2e004cb4 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -134,8 +134,8 @@
 142  64		select			sys_select			-
 143  common	flock			sys_flock			sys_flock
 144  common	msync			sys_msync			sys_msync
-145  common	readv			sys_readv			compat_sys_readv
-146  common	writev			sys_writev			compat_sys_writev
+145  common	readv			sys_readv			sys_readv
+146  common	writev			sys_writev			sys_writev
 147  common	getsid			sys_getsid			sys_getsid
 148  common	fdatasync		sys_fdatasync			sys_fdatasync
 149  common	_sysctl			-				-
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 4af114e84f2022..a87ddb282ab16f 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -149,8 +149,8 @@
 117	common	getrusage		sys_getrusage			compat_sys_getrusage
 118	common	getsockopt		sys_getsockopt			sys_getsockopt
 119	common	getcwd			sys_getcwd
-120	common	readv			sys_readv			compat_sys_readv
-121	common	writev			sys_writev			compat_sys_writev
+120	common	readv			sys_readv
+121	common	writev			sys_writev
 122	common	settimeofday		sys_settimeofday		compat_sys_settimeofday
 123	32	fchown			sys_fchown16
 123	64	fchown			sys_fchown
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index 1583831f61a9df..aa321444a41f63 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -12,6 +12,8 @@
  * Reuse the 64-bit entry points for the x32 versions that occupy different
  * slots in the syscall table.
  */
+#define __x32_sys_readv		__x64_sys_readv
+#define __x32_sys_writev	__x64_sys_writev
 #define __x32_sys_getsockopt	__x64_sys_getsockopt
 #define __x32_sys_setsockopt	__x64_sys_setsockopt
 
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 9d11028736661b..54ab4beb517f25 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -156,8 +156,8 @@
 142	i386	_newselect		sys_select			compat_sys_select
 143	i386	flock			sys_flock
 144	i386	msync			sys_msync
-145	i386	readv			sys_readv			compat_sys_readv
-146	i386	writev			sys_writev			compat_sys_writev
+145	i386	readv			sys_readv
+146	i386	writev			sys_writev
 147	i386	getsid			sys_getsid
 148	i386	fdatasync		sys_fdatasync
 149	i386	_sysctl			sys_ni_syscall
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index f30d6ae9a6883c..b1e59957c5c51c 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -371,8 +371,8 @@
 512	x32	rt_sigaction		compat_sys_rt_sigaction
 513	x32	rt_sigreturn		compat_sys_x32_rt_sigreturn
 514	x32	ioctl			compat_sys_ioctl
-515	x32	readv			compat_sys_readv
-516	x32	writev			compat_sys_writev
+515	x32	readv			sys_readv
+516	x32	writev			sys_writev
 517	x32	recvfrom		compat_sys_recvfrom
 518	x32	sendmsg			compat_sys_sendmsg
 519	x32	recvmsg			compat_sys_recvmsg
diff --git a/fs/read_write.c b/fs/read_write.c
index 9eb63c53da78f2..560d1b0bdef7bc 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1217,13 +1217,6 @@ SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
  * in_compat_syscall().
  */
 #ifdef CONFIG_COMPAT
-COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
-		const struct iovec __user *, vec,
-		compat_ulong_t, vlen)
-{
-	return do_readv(fd, vec, vlen, 0);
-}
-
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
 COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
 		const struct iovec __user *, vec,
@@ -1265,13 +1258,6 @@ COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
 	return do_preadv(fd, vec, vlen, pos, flags);
 }
 
-COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
-		const struct iovec __user *, vec,
-		compat_ulong_t, vlen)
-{
-	return do_writev(fd, vec, vlen, 0);
-}
-
 #ifdef __ARCH_WANT_COMPAT_SYS_PWRITEV64
 COMPAT_SYSCALL_DEFINE4(pwritev64, unsigned long, fd,
 		const struct iovec __user *, vec,
diff --git a/include/linux/compat.h b/include/linux/compat.h
index ad6dc56e8828d6..0ff848234df8ba 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -545,10 +545,6 @@ asmlinkage long compat_sys_getdents(unsigned int fd,
 
 /* fs/read_write.c */
 asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int);
-asmlinkage ssize_t compat_sys_readv(compat_ulong_t fd,
-		const struct compat_iovec __user *vec, compat_ulong_t vlen);
-asmlinkage ssize_t compat_sys_writev(compat_ulong_t fd,
-		const struct compat_iovec __user *vec, compat_ulong_t vlen);
 /* No generic prototype for pread64 and pwrite64 */
 asmlinkage ssize_t compat_sys_preadv(compat_ulong_t fd,
 		const struct compat_iovec __user *vec,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 995b36c2ea7d8a..211c9eacbda6eb 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -207,9 +207,9 @@ __SYSCALL(__NR_read, sys_read)
 #define __NR_write 64
 __SYSCALL(__NR_write, sys_write)
 #define __NR_readv 65
-__SC_COMP(__NR_readv, sys_readv, compat_sys_readv)
+__SC_COMP(__NR_readv, sys_readv, sys_readv)
 #define __NR_writev 66
-__SC_COMP(__NR_writev, sys_writev, compat_sys_writev)
+__SC_COMP(__NR_writev, sys_writev, sys_writev)
 #define __NR_pread64 67
 __SC_COMP(__NR_pread64, sys_pread64, compat_sys_pread64)
 #define __NR_pwrite64 68
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index 995b36c2ea7d8a..211c9eacbda6eb 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -207,9 +207,9 @@ __SYSCALL(__NR_read, sys_read)
 #define __NR_write 64
 __SYSCALL(__NR_write, sys_write)
 #define __NR_readv 65
-__SC_COMP(__NR_readv, sys_readv, compat_sys_readv)
+__SC_COMP(__NR_readv, sys_readv, sys_readv)
 #define __NR_writev 66
-__SC_COMP(__NR_writev, sys_writev, compat_sys_writev)
+__SC_COMP(__NR_writev, sys_writev, sys_writev)
 #define __NR_pread64 67
 __SC_COMP(__NR_pread64, sys_pread64, compat_sys_pread64)
 #define __NR_pwrite64 68
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index 3ca6fe057a0b1f..46be68029587f9 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -189,8 +189,8 @@
 142	common	_newselect			sys_select			compat_sys_select
 143	common	flock				sys_flock
 144	common	msync				sys_msync
-145	common	readv				sys_readv			compat_sys_readv
-146	common	writev				sys_writev			compat_sys_writev
+145	common	readv				sys_readv
+146	common	writev				sys_writev
 147	common	getsid				sys_getsid
 148	common	fdatasync			sys_fdatasync
 149	nospu	_sysctl				sys_ni_syscall
diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
index 6a0bbea225db0d..fb5e61ce9d5838 100644
--- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
@@ -134,8 +134,8 @@
 142  64		select			sys_select			-
 143  common	flock			sys_flock			sys_flock
 144  common	msync			sys_msync			compat_sys_msync
-145  common	readv			sys_readv			compat_sys_readv
-146  common	writev			sys_writev			compat_sys_writev
+145  common	readv			sys_readv
+146  common	writev			sys_writev
 147  common	getsid			sys_getsid			sys_getsid
 148  common	fdatasync		sys_fdatasync			sys_fdatasync
 149  common	_sysctl			-				-
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
index f30d6ae9a6883c..b1e59957c5c51c 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -371,8 +371,8 @@
 512	x32	rt_sigaction		compat_sys_rt_sigaction
 513	x32	rt_sigreturn		compat_sys_x32_rt_sigreturn
 514	x32	ioctl			compat_sys_ioctl
-515	x32	readv			compat_sys_readv
-516	x32	writev			compat_sys_writev
+515	x32	readv			sys_readv
+516	x32	writev			sys_writev
 517	x32	recvfrom		compat_sys_recvfrom
 518	x32	sendmsg			compat_sys_sendmsg
 519	x32	recvmsg			compat_sys_recvmsg
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 7/9] fs: remove compat_sys_vmsplice
  2020-09-18 12:45 let import_iovec deal with compat_iovecs as well Christoph Hellwig
                   ` (5 preceding siblings ...)
  2020-09-18 12:45 ` [PATCH 6/9] fs: remove the compat readv/writev syscalls Christoph Hellwig
@ 2020-09-18 12:45 ` Christoph Hellwig
  2020-09-18 12:45 ` [PATCH 8/9] mm: remove compat_process_vm_{readv,writev} Christoph Hellwig
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-18 12:45 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, x86, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

Now that import_iovec handles compat iovecs, the native vmsplice syscall
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/include/asm/unistd32.h             |  2 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl     |  2 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl     |  2 +-
 arch/parisc/kernel/syscalls/syscall.tbl       |  2 +-
 arch/powerpc/kernel/syscalls/syscall.tbl      |  2 +-
 arch/s390/kernel/syscalls/syscall.tbl         |  2 +-
 arch/sparc/kernel/syscalls/syscall.tbl        |  2 +-
 arch/x86/entry/syscall_x32.c                  |  1 +
 arch/x86/entry/syscalls/syscall_32.tbl        |  2 +-
 arch/x86/entry/syscalls/syscall_64.tbl        |  2 +-
 fs/splice.c                                   | 57 +++++--------------
 include/linux/compat.h                        |  4 --
 include/uapi/asm-generic/unistd.h             |  2 +-
 tools/include/uapi/asm-generic/unistd.h       |  2 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  2 +-
 .../perf/arch/s390/entry/syscalls/syscall.tbl |  2 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl    |  2 +-
 17 files changed, 28 insertions(+), 62 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 4a236493dca5b9..11dfae3a8563bd 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -697,7 +697,7 @@ __SYSCALL(__NR_sync_file_range2, compat_sys_aarch32_sync_file_range2)
 #define __NR_tee 342
 __SYSCALL(__NR_tee, sys_tee)
 #define __NR_vmsplice 343
-__SYSCALL(__NR_vmsplice, compat_sys_vmsplice)
+__SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages 344
 __SYSCALL(__NR_move_pages, compat_sys_move_pages)
 #define __NR_getcpu 345
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index c99a92646f8ee9..5a39d4de0ac85b 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -278,7 +278,7 @@
 267	n32	splice				sys_splice
 268	n32	sync_file_range			sys_sync_file_range
 269	n32	tee				sys_tee
-270	n32	vmsplice			compat_sys_vmsplice
+270	n32	vmsplice			sys_vmsplice
 271	n32	move_pages			compat_sys_move_pages
 272	n32	set_robust_list			compat_sys_set_robust_list
 273	n32	get_robust_list			compat_sys_get_robust_list
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 075064d10661bf..136efc6b8c5444 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -318,7 +318,7 @@
 304	o32	splice				sys_splice
 305	o32	sync_file_range			sys_sync_file_range		sys32_sync_file_range
 306	o32	tee				sys_tee
-307	o32	vmsplice			sys_vmsplice			compat_sys_vmsplice
+307	o32	vmsplice			sys_vmsplice
 308	o32	move_pages			sys_move_pages			compat_sys_move_pages
 309	o32	set_robust_list			sys_set_robust_list		compat_sys_set_robust_list
 310	o32	get_robust_list			sys_get_robust_list		compat_sys_get_robust_list
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 192abde0001d9d..a9e184192caedd 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -330,7 +330,7 @@
 292	32	sync_file_range		parisc_sync_file_range
 292	64	sync_file_range		sys_sync_file_range
 293	common	tee			sys_tee
-294	common	vmsplice		sys_vmsplice			compat_sys_vmsplice
+294	common	vmsplice		sys_vmsplice
 295	common	move_pages		sys_move_pages			compat_sys_move_pages
 296	common	getcpu			sys_getcpu
 297	common	epoll_pwait		sys_epoll_pwait			compat_sys_epoll_pwait
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 6f1e2ecf0edad9..0d4985919ca34d 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -369,7 +369,7 @@
 282	common	unshare				sys_unshare
 283	common	splice				sys_splice
 284	common	tee				sys_tee
-285	common	vmsplice			sys_vmsplice			compat_sys_vmsplice
+285	common	vmsplice			sys_vmsplice
 286	common	openat				sys_openat			compat_sys_openat
 287	common	mkdirat				sys_mkdirat
 288	common	mknodat				sys_mknodat
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 6101cf2e004cb4..b5495a42814bd1 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -316,7 +316,7 @@
 306  common	splice			sys_splice			sys_splice
 307  common	sync_file_range		sys_sync_file_range		compat_sys_s390_sync_file_range
 308  common	tee			sys_tee				sys_tee
-309  common	vmsplice		sys_vmsplice			compat_sys_vmsplice
+309  common	vmsplice		sys_vmsplice			sys_vmsplice
 310  common	move_pages		sys_move_pages			compat_sys_move_pages
 311  common	getcpu			sys_getcpu			sys_getcpu
 312  common	epoll_pwait		sys_epoll_pwait			compat_sys_epoll_pwait
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index a87ddb282ab16f..f1810c1a35caa5 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -38,7 +38,7 @@
 23	64    	setuid			sys_setuid
 24	32	getuid			sys_getuid16
 24	64   	getuid			sys_getuid
-25	common	vmsplice		sys_vmsplice			compat_sys_vmsplice
+25	common	vmsplice		sys_vmsplice
 26	common	ptrace			sys_ptrace			compat_sys_ptrace
 27	common	alarm			sys_alarm
 28	common	sigaltstack		sys_sigaltstack			compat_sys_sigaltstack
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index aa321444a41f63..a4840b9d50ad14 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -16,6 +16,7 @@
 #define __x32_sys_writev	__x64_sys_writev
 #define __x32_sys_getsockopt	__x64_sys_getsockopt
 #define __x32_sys_setsockopt	__x64_sys_setsockopt
+#define __x32_sys_vmsplice	__x64_sys_vmsplice
 
 #define __SYSCALL_64(nr, sym)
 
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 54ab4beb517f25..0fb2f172581e51 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -327,7 +327,7 @@
 313	i386	splice			sys_splice
 314	i386	sync_file_range		sys_ia32_sync_file_range
 315	i386	tee			sys_tee
-316	i386	vmsplice		sys_vmsplice			compat_sys_vmsplice
+316	i386	vmsplice		sys_vmsplice
 317	i386	move_pages		sys_move_pages			compat_sys_move_pages
 318	i386	getcpu			sys_getcpu
 319	i386	epoll_pwait		sys_epoll_pwait
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index b1e59957c5c51c..642af919183de4 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -388,7 +388,7 @@
 529	x32	waitid			compat_sys_waitid
 530	x32	set_robust_list		compat_sys_set_robust_list
 531	x32	get_robust_list		compat_sys_get_robust_list
-532	x32	vmsplice		compat_sys_vmsplice
+532	x32	vmsplice		sys_vmsplice
 533	x32	move_pages		compat_sys_move_pages
 534	x32	preadv			compat_sys_preadv64
 535	x32	pwritev			compat_sys_pwritev64
diff --git a/fs/splice.c b/fs/splice.c
index 132d42b9871f9b..18d84544030b39 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -33,7 +33,6 @@
 #include <linux/security.h>
 #include <linux/gfp.h>
 #include <linux/socket.h>
-#include <linux/compat.h>
 #include <linux/sched/signal.h>
 
 #include "internal.h"
@@ -1332,20 +1331,6 @@ static int vmsplice_type(struct fd f, int *type)
  * Currently we punt and implement it as a normal copy, see pipe_to_user().
  *
  */
-static long do_vmsplice(struct file *f, struct iov_iter *iter, unsigned int flags)
-{
-	if (unlikely(flags & ~SPLICE_F_ALL))
-		return -EINVAL;
-
-	if (!iov_iter_count(iter))
-		return 0;
-
-	if (iov_iter_rw(iter) = WRITE)
-		return vmsplice_to_pipe(f, iter, flags);
-	else
-		return vmsplice_to_user(f, iter, flags);
-}
-
 SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
 		unsigned long, nr_segs, unsigned int, flags)
 {
@@ -1356,6 +1341,9 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
 	struct fd f;
 	int type;
 
+	if (unlikely(flags & ~SPLICE_F_ALL))
+		return -EINVAL;
+
 	f = fdget(fd);
 	error = vmsplice_type(f, &type);
 	if (error)
@@ -1363,40 +1351,21 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
 
 	error = import_iovec(type, uiov, nr_segs,
 			     ARRAY_SIZE(iovstack), &iov, &iter);
-	if (error >= 0) {
-		error = do_vmsplice(f.file, &iter, flags);
-		kfree(iov);
-	}
-	fdput(f);
-	return error;
-}
+	if (error < 0)
+		goto out_fdput;
 
-#ifdef CONFIG_COMPAT
-COMPAT_SYSCALL_DEFINE4(vmsplice, int, fd, const struct compat_iovec __user *, iov32,
-		    unsigned int, nr_segs, unsigned int, flags)
-{
-	struct iovec iovstack[UIO_FASTIOV];
-	struct iovec *iov = iovstack;
-	struct iov_iter iter;
-	ssize_t error;
-	struct fd f;
-	int type;
-
-	f = fdget(fd);
-	error = vmsplice_type(f, &type);
-	if (error)
-		return error;
+	if (!iov_iter_count(&iter))
+		error = 0;
+	else if (iov_iter_rw(&iter) = WRITE)
+		error = vmsplice_to_pipe(f.file, &iter, flags);
+	else
+		error = vmsplice_to_user(f.file, &iter, flags);
 
-	error = import_iovec(type, (struct iovec __user *)iov32, nr_segs,
-			     ARRAY_SIZE(iovstack), &iov, &iter);
-	if (error >= 0) {
-		error = do_vmsplice(f.file, &iter, flags);
-		kfree(iov);
-	}
+	kfree(iov);
+out_fdput:
 	fdput(f);
 	return error;
 }
-#endif
 
 SYSCALL_DEFINE6(splice, int, fd_in, loff_t __user *, off_in,
 		int, fd_out, loff_t __user *, off_out,
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 0ff848234df8ba..a7af6ed06cb000 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -597,10 +597,6 @@ asmlinkage long compat_sys_signalfd4(int ufd,
 				     const compat_sigset_t __user *sigmask,
 				     compat_size_t sigsetsize, int flags);
 
-/* fs/splice.c */
-asmlinkage long compat_sys_vmsplice(int fd, const struct compat_iovec __user *,
-				    unsigned int nr_segs, unsigned int flags);
-
 /* fs/stat.c */
 asmlinkage long compat_sys_newfstatat(unsigned int dfd,
 				      const char __user *filename,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 211c9eacbda6eb..f2dcb0d5703014 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -237,7 +237,7 @@ __SC_COMP(__NR_signalfd4, sys_signalfd4, compat_sys_signalfd4)
 
 /* fs/splice.c */
 #define __NR_vmsplice 75
-__SC_COMP(__NR_vmsplice, sys_vmsplice, compat_sys_vmsplice)
+__SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_splice 76
 __SYSCALL(__NR_splice, sys_splice)
 #define __NR_tee 77
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index 211c9eacbda6eb..f2dcb0d5703014 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -237,7 +237,7 @@ __SC_COMP(__NR_signalfd4, sys_signalfd4, compat_sys_signalfd4)
 
 /* fs/splice.c */
 #define __NR_vmsplice 75
-__SC_COMP(__NR_vmsplice, sys_vmsplice, compat_sys_vmsplice)
+__SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_splice 76
 __SYSCALL(__NR_splice, sys_splice)
 #define __NR_tee 77
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index 46be68029587f9..26f0347c15118b 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -363,7 +363,7 @@
 282	common	unshare				sys_unshare
 283	common	splice				sys_splice
 284	common	tee				sys_tee
-285	common	vmsplice			sys_vmsplice			compat_sys_vmsplice
+285	common	vmsplice			sys_vmsplice
 286	common	openat				sys_openat			compat_sys_openat
 287	common	mkdirat				sys_mkdirat
 288	common	mknodat				sys_mknodat
diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
index fb5e61ce9d5838..02ad81f69bb7e3 100644
--- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
@@ -316,7 +316,7 @@
 306  common	splice			sys_splice			compat_sys_splice
 307  common	sync_file_range		sys_sync_file_range		compat_sys_s390_sync_file_range
 308  common	tee			sys_tee				compat_sys_tee
-309  common	vmsplice		sys_vmsplice			compat_sys_vmsplice
+309  common	vmsplice		sys_vmsplice			sys_vmsplice
 310  common	move_pages		sys_move_pages			compat_sys_move_pages
 311  common	getcpu			sys_getcpu			compat_sys_getcpu
 312  common	epoll_pwait		sys_epoll_pwait			compat_sys_epoll_pwait
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
index b1e59957c5c51c..642af919183de4 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -388,7 +388,7 @@
 529	x32	waitid			compat_sys_waitid
 530	x32	set_robust_list		compat_sys_set_robust_list
 531	x32	get_robust_list		compat_sys_get_robust_list
-532	x32	vmsplice		compat_sys_vmsplice
+532	x32	vmsplice		sys_vmsplice
 533	x32	move_pages		compat_sys_move_pages
 534	x32	preadv			compat_sys_preadv64
 535	x32	pwritev			compat_sys_pwritev64
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 8/9] mm: remove compat_process_vm_{readv,writev}
  2020-09-18 12:45 let import_iovec deal with compat_iovecs as well Christoph Hellwig
                   ` (6 preceding siblings ...)
  2020-09-18 12:45 ` [PATCH 7/9] fs: remove compat_sys_vmsplice Christoph Hellwig
@ 2020-09-18 12:45 ` Christoph Hellwig
  2020-09-18 13:48   ` Arnd Bergmann
  2020-09-18 12:45 ` [PATCH 9/9] security/keys: remove compat_keyctl_instantiate_key_iov Christoph Hellwig
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-18 12:45 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, x86, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

Now that import_iovec handles compat iovecs, the native syscalls
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/include/asm/unistd32.h             |  4 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl     |  4 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl     |  4 +-
 arch/parisc/kernel/syscalls/syscall.tbl       |  4 +-
 arch/powerpc/kernel/syscalls/syscall.tbl      |  4 +-
 arch/s390/kernel/syscalls/syscall.tbl         |  4 +-
 arch/sparc/kernel/syscalls/syscall.tbl        |  4 +-
 arch/x86/entry/syscall_x32.c                  |  2 +
 arch/x86/entry/syscalls/syscall_32.tbl        |  4 +-
 arch/x86/entry/syscalls/syscall_64.tbl        |  4 +-
 include/linux/compat.h                        |  8 ---
 include/uapi/asm-generic/unistd.h             |  6 +-
 mm/process_vm_access.c                        | 70 -------------------
 tools/include/uapi/asm-generic/unistd.h       |  6 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  4 +-
 .../perf/arch/s390/entry/syscalls/syscall.tbl |  4 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl    |  4 +-
 17 files changed, 30 insertions(+), 110 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 11dfae3a8563bd..0c280a05f699bf 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -763,9 +763,9 @@ __SYSCALL(__NR_sendmmsg, compat_sys_sendmmsg)
 #define __NR_setns 375
 __SYSCALL(__NR_setns, sys_setns)
 #define __NR_process_vm_readv 376
-__SYSCALL(__NR_process_vm_readv, compat_sys_process_vm_readv)
+__SYSCALL(__NR_process_vm_readv, sys_process_vm_readv)
 #define __NR_process_vm_writev 377
-__SYSCALL(__NR_process_vm_writev, compat_sys_process_vm_writev)
+__SYSCALL(__NR_process_vm_writev, sys_process_vm_writev)
 #define __NR_kcmp 378
 __SYSCALL(__NR_kcmp, sys_kcmp)
 #define __NR_finit_module 379
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 5a39d4de0ac85b..0bc2e0fcf1ee56 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -317,8 +317,8 @@
 306	n32	syncfs				sys_syncfs
 307	n32	sendmmsg			compat_sys_sendmmsg
 308	n32	setns				sys_setns
-309	n32	process_vm_readv		compat_sys_process_vm_readv
-310	n32	process_vm_writev		compat_sys_process_vm_writev
+309	n32	process_vm_readv		sys_process_vm_readv
+310	n32	process_vm_writev		sys_process_vm_writev
 311	n32	kcmp				sys_kcmp
 312	n32	finit_module			sys_finit_module
 313	n32	sched_setattr			sys_sched_setattr
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 136efc6b8c5444..b408c13b934296 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -356,8 +356,8 @@
 342	o32	syncfs				sys_syncfs
 343	o32	sendmmsg			sys_sendmmsg			compat_sys_sendmmsg
 344	o32	setns				sys_setns
-345	o32	process_vm_readv		sys_process_vm_readv		compat_sys_process_vm_readv
-346	o32	process_vm_writev		sys_process_vm_writev		compat_sys_process_vm_writev
+345	o32	process_vm_readv		sys_process_vm_readv
+346	o32	process_vm_writev		sys_process_vm_writev
 347	o32	kcmp				sys_kcmp
 348	o32	finit_module			sys_finit_module
 349	o32	sched_setattr			sys_sched_setattr
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index a9e184192caedd..2015a5124b78ad 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -372,8 +372,8 @@
 327	common	syncfs			sys_syncfs
 328	common	setns			sys_setns
 329	common	sendmmsg		sys_sendmmsg			compat_sys_sendmmsg
-330	common	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-331	common	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+330	common	process_vm_readv	sys_process_vm_readv
+331	common	process_vm_writev	sys_process_vm_writev
 332	common	kcmp			sys_kcmp
 333	common	finit_module		sys_finit_module
 334	common	sched_setattr		sys_sched_setattr
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 0d4985919ca34d..66a472aa635d3f 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -449,8 +449,8 @@
 348	common	syncfs				sys_syncfs
 349	common	sendmmsg			sys_sendmmsg			compat_sys_sendmmsg
 350	common	setns				sys_setns
-351	nospu	process_vm_readv		sys_process_vm_readv		compat_sys_process_vm_readv
-352	nospu	process_vm_writev		sys_process_vm_writev		compat_sys_process_vm_writev
+351	nospu	process_vm_readv		sys_process_vm_readv
+352	nospu	process_vm_writev		sys_process_vm_writev
 353	nospu	finit_module			sys_finit_module
 354	nospu	kcmp				sys_kcmp
 355	common	sched_setattr			sys_sched_setattr
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index b5495a42814bd1..7485867a490bb2 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -347,8 +347,8 @@
 337  common	clock_adjtime		sys_clock_adjtime		sys_clock_adjtime32
 338  common	syncfs			sys_syncfs			sys_syncfs
 339  common	setns			sys_setns			sys_setns
-340  common	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-341  common	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+340  common	process_vm_readv	sys_process_vm_readv		sys_process_vm_readv
+341  common	process_vm_writev	sys_process_vm_writev		sys_process_vm_writev
 342  common	s390_runtime_instr	sys_s390_runtime_instr		sys_s390_runtime_instr
 343  common	kcmp			sys_kcmp			sys_kcmp
 344  common	finit_module		sys_finit_module		sys_finit_module
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index f1810c1a35caa5..4a9365b2e340b2 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -406,8 +406,8 @@
 335	common	syncfs			sys_syncfs
 336	common	sendmmsg		sys_sendmmsg			compat_sys_sendmmsg
 337	common	setns			sys_setns
-338	common	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-339	common	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+338	common	process_vm_readv	sys_process_vm_readv
+339	common	process_vm_writev	sys_process_vm_writev
 340	32	kern_features		sys_ni_syscall			sys_kern_features
 340	64	kern_features		sys_kern_features
 341	common	kcmp			sys_kcmp
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index a4840b9d50ad14..f2fe0a33bcfdd5 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -17,6 +17,8 @@
 #define __x32_sys_getsockopt	__x64_sys_getsockopt
 #define __x32_sys_setsockopt	__x64_sys_setsockopt
 #define __x32_sys_vmsplice	__x64_sys_vmsplice
+#define __x32_sys_process_vm_readv	__x64_sys_process_vm_readv
+#define __x32_sys_process_vm_writev	__x64_sys_process_vm_writev
 
 #define __SYSCALL_64(nr, sym)
 
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 0fb2f172581e51..5fbe10ad8a23fc 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -358,8 +358,8 @@
 344	i386	syncfs			sys_syncfs
 345	i386	sendmmsg		sys_sendmmsg			compat_sys_sendmmsg
 346	i386	setns			sys_setns
-347	i386	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-348	i386	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+347	i386	process_vm_readv	sys_process_vm_readv
+348	i386	process_vm_writev	sys_process_vm_writev
 349	i386	kcmp			sys_kcmp
 350	i386	finit_module		sys_finit_module
 351	i386	sched_setattr		sys_sched_setattr
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 642af919183de4..347809649ba28f 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -395,8 +395,8 @@
 536	x32	rt_tgsigqueueinfo	compat_sys_rt_tgsigqueueinfo
 537	x32	recvmmsg		compat_sys_recvmmsg_time64
 538	x32	sendmmsg		compat_sys_sendmmsg
-539	x32	process_vm_readv	compat_sys_process_vm_readv
-540	x32	process_vm_writev	compat_sys_process_vm_writev
+539	x32	process_vm_readv	sys_process_vm_readv
+540	x32	process_vm_writev	sys_process_vm_writev
 541	x32	setsockopt		sys_setsockopt
 542	x32	getsockopt		sys_getsockopt
 543	x32	io_setup		compat_sys_io_setup
diff --git a/include/linux/compat.h b/include/linux/compat.h
index a7af6ed06cb000..bbb065e09b058a 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -780,14 +780,6 @@ asmlinkage long compat_sys_open_by_handle_at(int mountdirfd,
 					     int flags);
 asmlinkage long compat_sys_sendmmsg(int fd, struct compat_mmsghdr __user *mmsg,
 				    unsigned vlen, unsigned int flags);
-asmlinkage ssize_t compat_sys_process_vm_readv(compat_pid_t pid,
-		const struct compat_iovec __user *lvec,
-		compat_ulong_t liovcnt, const struct compat_iovec __user *rvec,
-		compat_ulong_t riovcnt, compat_ulong_t flags);
-asmlinkage ssize_t compat_sys_process_vm_writev(compat_pid_t pid,
-		const struct compat_iovec __user *lvec,
-		compat_ulong_t liovcnt, const struct compat_iovec __user *rvec,
-		compat_ulong_t riovcnt, compat_ulong_t flags);
 asmlinkage long compat_sys_execveat(int dfd, const char __user *filename,
 		     const compat_uptr_t __user *argv,
 		     const compat_uptr_t __user *envp, int flags);
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index f2dcb0d5703014..c1dfe99c9c3f70 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -727,11 +727,9 @@ __SYSCALL(__NR_setns, sys_setns)
 #define __NR_sendmmsg 269
 __SC_COMP(__NR_sendmmsg, sys_sendmmsg, compat_sys_sendmmsg)
 #define __NR_process_vm_readv 270
-__SC_COMP(__NR_process_vm_readv, sys_process_vm_readv, \
-          compat_sys_process_vm_readv)
+__SYSCALL(__NR_process_vm_readv, sys_process_vm_readv)
 #define __NR_process_vm_writev 271
-__SC_COMP(__NR_process_vm_writev, sys_process_vm_writev, \
-          compat_sys_process_vm_writev)
+__SYSCALL(__NR_process_vm_writev, sys_process_vm_writev)
 #define __NR_kcmp 272
 __SYSCALL(__NR_kcmp, sys_kcmp)
 #define __NR_finit_module 273
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index f21feebbd48f39..752bb724f10efa 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -14,10 +14,6 @@
 #include <linux/slab.h>
 #include <linux/syscalls.h>
 
-#ifdef CONFIG_COMPAT
-#include <linux/compat.h>
-#endif
-
 /**
  * process_vm_rw_pages - read/write pages from task specified
  * @pages: array of pointers to pages we want to copy
@@ -307,69 +303,3 @@ SYSCALL_DEFINE6(process_vm_writev, pid_t, pid,
 {
 	return process_vm_rw(pid, lvec, liovcnt, rvec, riovcnt, flags, 1);
 }
-
-#ifdef CONFIG_COMPAT
-
-static ssize_t
-compat_process_vm_rw(compat_pid_t pid,
-		     const struct compat_iovec __user *lvec,
-		     unsigned long liovcnt,
-		     const struct compat_iovec __user *rvec,
-		     unsigned long riovcnt,
-		     unsigned long flags, int vm_write)
-{
-	struct iovec iovstack_l[UIO_FASTIOV];
-	struct iovec iovstack_r[UIO_FASTIOV];
-	struct iovec *iov_l = iovstack_l;
-	struct iovec *iov_r = iovstack_r;
-	struct iov_iter iter;
-	ssize_t rc = -EFAULT;
-	int dir = vm_write ? WRITE : READ;
-
-	if (flags != 0)
-		return -EINVAL;
-
-	rc = import_iovec(dir, (const struct iovec __user *)lvec, liovcnt,
-			  UIO_FASTIOV, &iov_l, &iter);
-	if (rc < 0)
-		return rc;
-	if (!iov_iter_count(&iter))
-		goto free_iovecs;
-	rc = rw_copy_check_uvector(CHECK_IOVEC_ONLY,
-				   (const struct iovec __user *)rvec, riovcnt,
-				   UIO_FASTIOV, iovstack_r, &iov_r);
-	if (rc <= 0)
-		goto free_iovecs;
-
-	rc = process_vm_rw_core(pid, &iter, iov_r, riovcnt, flags, vm_write);
-
-free_iovecs:
-	if (iov_r != iovstack_r)
-		kfree(iov_r);
-	kfree(iov_l);
-	return rc;
-}
-
-COMPAT_SYSCALL_DEFINE6(process_vm_readv, compat_pid_t, pid,
-		       const struct compat_iovec __user *, lvec,
-		       compat_ulong_t, liovcnt,
-		       const struct compat_iovec __user *, rvec,
-		       compat_ulong_t, riovcnt,
-		       compat_ulong_t, flags)
-{
-	return compat_process_vm_rw(pid, lvec, liovcnt, rvec,
-				    riovcnt, flags, 0);
-}
-
-COMPAT_SYSCALL_DEFINE6(process_vm_writev, compat_pid_t, pid,
-		       const struct compat_iovec __user *, lvec,
-		       compat_ulong_t, liovcnt,
-		       const struct compat_iovec __user *, rvec,
-		       compat_ulong_t, riovcnt,
-		       compat_ulong_t, flags)
-{
-	return compat_process_vm_rw(pid, lvec, liovcnt, rvec,
-				    riovcnt, flags, 1);
-}
-
-#endif
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index f2dcb0d5703014..c1dfe99c9c3f70 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -727,11 +727,9 @@ __SYSCALL(__NR_setns, sys_setns)
 #define __NR_sendmmsg 269
 __SC_COMP(__NR_sendmmsg, sys_sendmmsg, compat_sys_sendmmsg)
 #define __NR_process_vm_readv 270
-__SC_COMP(__NR_process_vm_readv, sys_process_vm_readv, \
-          compat_sys_process_vm_readv)
+__SYSCALL(__NR_process_vm_readv, sys_process_vm_readv)
 #define __NR_process_vm_writev 271
-__SC_COMP(__NR_process_vm_writev, sys_process_vm_writev, \
-          compat_sys_process_vm_writev)
+__SYSCALL(__NR_process_vm_writev, sys_process_vm_writev)
 #define __NR_kcmp 272
 __SYSCALL(__NR_kcmp, sys_kcmp)
 #define __NR_finit_module 273
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index 26f0347c15118b..a188f053cbf90a 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -443,8 +443,8 @@
 348	common	syncfs				sys_syncfs
 349	common	sendmmsg			sys_sendmmsg			compat_sys_sendmmsg
 350	common	setns				sys_setns
-351	nospu	process_vm_readv		sys_process_vm_readv		compat_sys_process_vm_readv
-352	nospu	process_vm_writev		sys_process_vm_writev		compat_sys_process_vm_writev
+351	nospu	process_vm_readv		sys_process_vm_readv
+352	nospu	process_vm_writev		sys_process_vm_writev
 353	nospu	finit_module			sys_finit_module
 354	nospu	kcmp				sys_kcmp
 355	common	sched_setattr			sys_sched_setattr
diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
index 02ad81f69bb7e3..c44c83032c3a04 100644
--- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
@@ -347,8 +347,8 @@
 337  common	clock_adjtime		sys_clock_adjtime		compat_sys_clock_adjtime
 338  common	syncfs			sys_syncfs			sys_syncfs
 339  common	setns			sys_setns			sys_setns
-340  common	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-341  common	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+340  common	process_vm_readv	sys_process_vm_readv		sys_process_vm_readv
+341  common	process_vm_writev	sys_process_vm_writev		sys_process_vm_writev
 342  common	s390_runtime_instr	sys_s390_runtime_instr		sys_s390_runtime_instr
 343  common	kcmp			sys_kcmp			compat_sys_kcmp
 344  common	finit_module		sys_finit_module		compat_sys_finit_module
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
index 642af919183de4..347809649ba28f 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -395,8 +395,8 @@
 536	x32	rt_tgsigqueueinfo	compat_sys_rt_tgsigqueueinfo
 537	x32	recvmmsg		compat_sys_recvmmsg_time64
 538	x32	sendmmsg		compat_sys_sendmmsg
-539	x32	process_vm_readv	compat_sys_process_vm_readv
-540	x32	process_vm_writev	compat_sys_process_vm_writev
+539	x32	process_vm_readv	sys_process_vm_readv
+540	x32	process_vm_writev	sys_process_vm_writev
 541	x32	setsockopt		sys_setsockopt
 542	x32	getsockopt		sys_getsockopt
 543	x32	io_setup		compat_sys_io_setup
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 9/9] security/keys: remove compat_keyctl_instantiate_key_iov
  2020-09-18 12:45 let import_iovec deal with compat_iovecs as well Christoph Hellwig
                   ` (7 preceding siblings ...)
  2020-09-18 12:45 ` [PATCH 8/9] mm: remove compat_process_vm_{readv,writev} Christoph Hellwig
@ 2020-09-18 12:45 ` Christoph Hellwig
  2020-09-19 14:24 ` let import_iovec deal with compat_iovecs as well David Laight
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-18 12:45 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, x86, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

Now that import_iovec handles compat iovecs, the native version of
keyctl_instantiate_key_iov can be used for the compat case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 security/keys/compat.c   | 36 ++----------------------------------
 security/keys/internal.h |  5 -----
 security/keys/keyctl.c   |  2 +-
 3 files changed, 3 insertions(+), 40 deletions(-)

diff --git a/security/keys/compat.c b/security/keys/compat.c
index 7ae531db031cf8..1545efdca56227 100644
--- a/security/keys/compat.c
+++ b/security/keys/compat.c
@@ -11,38 +11,6 @@
 #include <linux/slab.h>
 #include "internal.h"
 
-/*
- * Instantiate a key with the specified compatibility multipart payload and
- * link the key into the destination keyring if one is given.
- *
- * The caller must have the appropriate instantiation permit set for this to
- * work (see keyctl_assume_authority).  No other permissions are required.
- *
- * If successful, 0 will be returned.
- */
-static long compat_keyctl_instantiate_key_iov(
-	key_serial_t id,
-	const struct compat_iovec __user *_payload_iov,
-	unsigned ioc,
-	key_serial_t ringid)
-{
-	struct iovec iovstack[UIO_FASTIOV], *iov = iovstack;
-	struct iov_iter from;
-	long ret;
-
-	if (!_payload_iov)
-		ioc = 0;
-
-	ret = import_iovec(WRITE, (const struct iovec __user *)_payload_iov,
-			   ioc, ARRAY_SIZE(iovstack), &iov, &from);
-	if (ret < 0)
-		return ret;
-
-	ret = keyctl_instantiate_key_common(id, &from, ringid);
-	kfree(iov);
-	return ret;
-}
-
 /*
  * The key control system call, 32-bit compatibility version for 64-bit archs
  */
@@ -113,8 +81,8 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option,
 		return keyctl_reject_key(arg2, arg3, arg4, arg5);
 
 	case KEYCTL_INSTANTIATE_IOV:
-		return compat_keyctl_instantiate_key_iov(
-			arg2, compat_ptr(arg3), arg4, arg5);
+		return keyctl_instantiate_key_iov(arg2, compat_ptr(arg3), arg4,
+						  arg5);
 
 	case KEYCTL_INVALIDATE:
 		return keyctl_invalidate_key(arg2);
diff --git a/security/keys/internal.h b/security/keys/internal.h
index 338a526cbfa516..9b9cf3b6fcbb4d 100644
--- a/security/keys/internal.h
+++ b/security/keys/internal.h
@@ -262,11 +262,6 @@ extern long keyctl_instantiate_key_iov(key_serial_t,
 				       const struct iovec __user *,
 				       unsigned, key_serial_t);
 extern long keyctl_invalidate_key(key_serial_t);
-
-struct iov_iter;
-extern long keyctl_instantiate_key_common(key_serial_t,
-					  struct iov_iter *,
-					  key_serial_t);
 extern long keyctl_restrict_keyring(key_serial_t id,
 				    const char __user *_type,
 				    const char __user *_restriction);
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index 9febd37a168fd0..e26bbccda7ccee 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -1164,7 +1164,7 @@ static int keyctl_change_reqkey_auth(struct key *key)
  *
  * If successful, 0 will be returned.
  */
-long keyctl_instantiate_key_common(key_serial_t id,
+static long keyctl_instantiate_key_common(key_serial_t id,
 				   struct iov_iter *from,
 				   key_serial_t ringid)
 {
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 3/9] fs: explicitly check for CHECK_IOVEC_ONLY in rw_copy_check_uvector
  2020-09-18 12:45 ` [PATCH 3/9] fs: explicitly check for CHECK_IOVEC_ONLY in rw_copy_check_uvector Christoph Hellwig
@ 2020-09-18 12:56   ` Matthew Wilcox
  2020-09-18 13:39   ` Johannes Thumshirn
  1 sibling, 0 replies; 171+ messages in thread
From: Matthew Wilcox @ 2020-09-18 12:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alexander Viro, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, x86, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Fri, Sep 18, 2020 at 02:45:27PM +0200, Christoph Hellwig wrote:
>  		}
> -		if (type >= 0
> -		    && unlikely(!access_ok(buf, len))) {
> +		if (type != CHECK_IOVEC_ONLY && unlikely(!access_ok(buf, len))) {

drop the unlikely() at the same time?  if it's really advantageous,
that should be embedded in the access_ok macro.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 2/9] compat.h: fix a spelling error in <linux/compat.h>
  2020-09-18 12:45 ` [PATCH 2/9] compat.h: fix a spelling error in <linux/compat.h> Christoph Hellwig
@ 2020-09-18 13:37   ` Johannes Thumshirn
  0 siblings, 0 replies; 171+ messages in thread
From: Johannes Thumshirn @ 2020-09-18 13:37 UTC (permalink / raw)
  To: Christoph Hellwig, Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, x86, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

On 18/09/2020 14:48, Christoph Hellwig wrote:
> We only have not compat_sys_readv64v2 syscall, only a
We have no?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 3/9] fs: explicitly check for CHECK_IOVEC_ONLY in rw_copy_check_uvector
  2020-09-18 12:45 ` [PATCH 3/9] fs: explicitly check for CHECK_IOVEC_ONLY in rw_copy_check_uvector Christoph Hellwig
  2020-09-18 12:56   ` Matthew Wilcox
@ 2020-09-18 13:39   ` Johannes Thumshirn
  1 sibling, 0 replies; 171+ messages in thread
From: Johannes Thumshirn @ 2020-09-18 13:39 UTC (permalink / raw)
  To: Christoph Hellwig, Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, x86, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-18 12:45 ` [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag Christoph Hellwig
@ 2020-09-18 13:40   ` Al Viro
  2020-09-18 13:44     ` Christoph Hellwig
  2020-09-20 15:15   ` Matthew Wilcox
  1 sibling, 1 reply; 171+ messages in thread
From: Al Viro @ 2020-09-18 13:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, x86, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

On Fri, Sep 18, 2020 at 02:45:25PM +0200, Christoph Hellwig wrote:
> Add a flag to force processing a syscall as a compat syscall.  This is
> required so that in_compat_syscall() works for I/O submitted by io_uring
> helper threads on behalf of compat syscalls.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  arch/sparc/include/asm/compat.h | 3 ++-
>  arch/x86/include/asm/compat.h   | 2 +-
>  fs/io_uring.c                   | 9 +++++++++
>  include/linux/compat.h          | 5 ++++-
>  include/linux/sched.h           | 1 +
>  5 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/sparc/include/asm/compat.h b/arch/sparc/include/asm/compat.h
> index 40a267b3bd5208..fee6c51d36e869 100644
> --- a/arch/sparc/include/asm/compat.h
> +++ b/arch/sparc/include/asm/compat.h
> @@ -211,7 +211,8 @@ static inline int is_compat_task(void)
>  static inline bool in_compat_syscall(void)
>  {
>  	/* Vector 0x110 is LINUX_32BIT_SYSCALL_TRAP */
> -	return pt_regs_trap_type(current_pt_regs()) = 0x110;
> +	return pt_regs_trap_type(current_pt_regs()) = 0x110 ||
> +		(current->flags & PF_FORCE_COMPAT);

Can't say I like that approach ;-/  Reasoning about the behaviour is much
harder when it's controlled like that - witness set_fs() shite...

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-18 13:40   ` Al Viro
@ 2020-09-18 13:44     ` Christoph Hellwig
  2020-09-18 13:58       ` Al Viro
  2020-09-18 13:59       ` Arnd Bergmann
  0 siblings, 2 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-18 13:44 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, x86, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Fri, Sep 18, 2020 at 02:40:12PM +0100, Al Viro wrote:
> >  	/* Vector 0x110 is LINUX_32BIT_SYSCALL_TRAP */
> > -	return pt_regs_trap_type(current_pt_regs()) = 0x110;
> > +	return pt_regs_trap_type(current_pt_regs()) = 0x110 ||
> > +		(current->flags & PF_FORCE_COMPAT);
> 
> Can't say I like that approach ;-/  Reasoning about the behaviour is much
> harder when it's controlled like that - witness set_fs() shite...

I don't particularly like it either.  But do you have a better idea
how to deal with io_uring vs compat tasks?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 8/9] mm: remove compat_process_vm_{readv,writev}
  2020-09-18 12:45 ` [PATCH 8/9] mm: remove compat_process_vm_{readv,writev} Christoph Hellwig
@ 2020-09-18 13:48   ` Arnd Bergmann
  0 siblings, 0 replies; 171+ messages in thread
From: Arnd Bergmann @ 2020-09-18 13:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alexander Viro, Andrew Morton, Jens Axboe, David Howells,
	Linux ARM, the arch/x86 maintainers, linux-kernel,
	open list:BROADCOM NVRAM DRIVER, Parisc List, linuxppc-dev,
	linux-s390, sparclinux, linux-block, linux-scsi,
	Linux FS-devel Mailing List, linux-aio, io-uring, linux-arch,
	Linux-MM, Networking, keyrings, LSM List

On Fri, Sep 18, 2020 at 2:45 PM Christoph Hellwig <hch@lst.de> wrote:
>
> Now that import_iovec handles compat iovecs, the native syscalls
> can be used for the compat case as well.
>
> diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
> index a4840b9d50ad14..f2fe0a33bcfdd5 100644
> --- a/arch/x86/entry/syscall_x32.c
> +++ b/arch/x86/entry/syscall_x32.c
> @@ -17,6 +17,8 @@
>  #define __x32_sys_getsockopt   __x64_sys_getsockopt
>  #define __x32_sys_setsockopt   __x64_sys_setsockopt
>  #define __x32_sys_vmsplice     __x64_sys_vmsplice
> +#define __x32_sys_process_vm_readv     __x64_sys_process_vm_readv
> +#define __x32_sys_process_vm_writev    __x64_sys_process_vm_writev
>
>  #define __SYSCALL_64(nr, sym)
>

I forgot this hack existed, and just sent a patch with subject "x86: add
__X32_COND_SYSCALL() macro" instead.

If I understand this right, the macros above should no longer be needed
once my patch gets merged.

        Arnd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-18 13:44     ` Christoph Hellwig
@ 2020-09-18 13:58       ` Al Viro
  2020-09-18 15:16         ` Christoph Hellwig
  2020-09-19 14:53         ` David Laight
  2020-09-18 13:59       ` Arnd Bergmann
  1 sibling, 2 replies; 171+ messages in thread
From: Al Viro @ 2020-09-18 13:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, x86, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

On Fri, Sep 18, 2020 at 03:44:06PM +0200, Christoph Hellwig wrote:
> On Fri, Sep 18, 2020 at 02:40:12PM +0100, Al Viro wrote:
> > >  	/* Vector 0x110 is LINUX_32BIT_SYSCALL_TRAP */
> > > -	return pt_regs_trap_type(current_pt_regs()) = 0x110;
> > > +	return pt_regs_trap_type(current_pt_regs()) = 0x110 ||
> > > +		(current->flags & PF_FORCE_COMPAT);
> > 
> > Can't say I like that approach ;-/  Reasoning about the behaviour is much
> > harder when it's controlled like that - witness set_fs() shite...
> 
> I don't particularly like it either.  But do you have a better idea
> how to deal with io_uring vs compat tasks?

<wry> git rm fs/io_uring.c would make a good starting point </wry>
Yes, I know it's not going to happen, but one can dream...

Said that, why not provide a variant that would take an explicit
"is it compat" argument and use it there?  And have the normal
one pass in_compat_syscall() to that...

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-18 13:44     ` Christoph Hellwig
  2020-09-18 13:58       ` Al Viro
@ 2020-09-18 13:59       ` Arnd Bergmann
  1 sibling, 0 replies; 171+ messages in thread
From: Arnd Bergmann @ 2020-09-18 13:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, Andrew Morton, Jens Axboe, David Howells, Linux ARM,
	the arch/x86 maintainers, linux-kernel,
	open list:BROADCOM NVRAM DRIVER, Parisc List, linuxppc-dev,
	linux-s390, sparclinux, linux-block, linux-scsi,
	Linux FS-devel Mailing List, linux-aio, io-uring, linux-arch,
	Linux-MM, Networking, keyrings, LSM List

On Fri, Sep 18, 2020 at 3:44 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Fri, Sep 18, 2020 at 02:40:12PM +0100, Al Viro wrote:
> > >     /* Vector 0x110 is LINUX_32BIT_SYSCALL_TRAP */
> > > -   return pt_regs_trap_type(current_pt_regs()) = 0x110;
> > > +   return pt_regs_trap_type(current_pt_regs()) = 0x110 ||
> > > +           (current->flags & PF_FORCE_COMPAT);
> >
> > Can't say I like that approach ;-/  Reasoning about the behaviour is much
> > harder when it's controlled like that - witness set_fs() shite...
>
> I don't particularly like it either.  But do you have a better idea
> how to deal with io_uring vs compat tasks?

Do we need to worry about something other than the compat_iovec
struct for now? Regarding the code in io_import_iovec(), it would
seem that can easily be handled by exposing an internal helper.
Instead of

#ifdef CONFIG_COMPAT
     if (req->ctx->compat)
            return compat_import_iovec(rw, buf, sqe_len, UIO_FASTIOV,
iovec, iter);
#endif
        return import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter);

This could do

    __import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec,
                     iter, req->ctx->compat);

With the normal import_iovec() becoming a trivial wrapper around
the same thing:

ssize_t import_iovec(int type, const struct iovec __user * uvector,
                 unsigned nr_segs, unsigned fast_segs,
                 struct iovec **iov, struct iov_iter *i)
{
     return __import_iovec(type, uvector, nr_segs, fast_segs, iov,
              i, in_compat_syscall());
}


         Arnd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-18 13:58       ` Al Viro
@ 2020-09-18 15:16         ` Christoph Hellwig
  2020-09-19 16:21           ` Andy Lutomirski
  2020-09-19 22:09           ` Al Viro
  2020-09-19 14:53         ` David Laight
  1 sibling, 2 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-18 15:16 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, x86, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> Said that, why not provide a variant that would take an explicit
> "is it compat" argument and use it there?  And have the normal
> one pass in_compat_syscall() to that...

That would help to not introduce a regression with this series yes.
But it wouldn't fix existing bugs when io_uring is used to access
read or write methods that use in_compat_syscall().  One example that
I recently ran into is drivers/scsi/sg.c.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: let import_iovec deal with compat_iovecs as well
  2020-09-18 12:45 let import_iovec deal with compat_iovecs as well Christoph Hellwig
                   ` (8 preceding siblings ...)
  2020-09-18 12:45 ` [PATCH 9/9] security/keys: remove compat_keyctl_instantiate_key_iov Christoph Hellwig
@ 2020-09-19 14:24 ` David Laight
  2020-09-21  4:41   ` 'Christoph Hellwig'
  2020-09-21 14:34 ` let import_iovec deal with compat_iovecs as well v2 Christoph Hellwig
  2020-09-25  4:51 ` let import_iovec deal with compat_iovecs as well v4 Christoph Hellwig
  11 siblings, 1 reply; 171+ messages in thread
From: David Laight @ 2020-09-19 14:24 UTC (permalink / raw)
  To: 'Christoph Hellwig', Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, x86, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

From: Christoph Hellwig
> Sent: 18 September 2020 13:45
> 
> this series changes import_iovec to transparently deal with comat iovec
> structures, and then cleanups up a lot of code dupliation.  But to get
> there it first has to fix the pre-existing bug that io_uring compat
> contexts don't trigger the in_compat_syscall() check.  This has so far
> been relatively harmless as very little code callable from io_uring used
> the check, and even that code that could be called usually wasn't.

I thought about that change while writing my import_iovec() => iovec_import()
patch - and thought that the io_uring code would (as usual) cause grief.

Christoph - did you see those patches?
	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-18 13:58       ` Al Viro
  2020-09-18 15:16         ` Christoph Hellwig
@ 2020-09-19 14:53         ` David Laight
  1 sibling, 0 replies; 171+ messages in thread
From: David Laight @ 2020-09-19 14:53 UTC (permalink / raw)
  To: 'Al Viro', Christoph Hellwig
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, x86, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

From: Al Viro
> Sent: 18 September 2020 14:58
> 
> On Fri, Sep 18, 2020 at 03:44:06PM +0200, Christoph Hellwig wrote:
> > On Fri, Sep 18, 2020 at 02:40:12PM +0100, Al Viro wrote:
> > > >  	/* Vector 0x110 is LINUX_32BIT_SYSCALL_TRAP */
> > > > -	return pt_regs_trap_type(current_pt_regs()) = 0x110;
> > > > +	return pt_regs_trap_type(current_pt_regs()) = 0x110 ||
> > > > +		(current->flags & PF_FORCE_COMPAT);
> > >
> > > Can't say I like that approach ;-/  Reasoning about the behaviour is much
> > > harder when it's controlled like that - witness set_fs() shite...
> >
> > I don't particularly like it either.  But do you have a better idea
> > how to deal with io_uring vs compat tasks?
> 
> <wry> git rm fs/io_uring.c would make a good starting point </wry>
> Yes, I know it's not going to happen, but one can dream...

Maybe the io_uring code needs some changes to make it vaguely safe.
- No support for 32-bit compat mixed working (or at all?).
  Plausibly a special worker could do 32bit work.
- ring structure (I'm assuming mapped by mmap()) never mapped
  in more than one process (not cloned by fork()).
- No implicit handover of files to another process.
  Would need an munmap, handover, mmap sequence.

In any case the io_ring rather abuses the import_iovec() interface.

The canonical sequence is (types from memory):
	struct iovec cache[8], *iov = cache;
	struct iter iter;
	...
	rval = import_iovec(..., &iov, 8, &iter);
	// Do read/write user using 'iter'
	free(iov);

I don't think there is any strict requirement that iter.iov
is set to either 'cache' or 'iov' (it probably must point
into one of them.)
But the io_uring code will make that assumption because the
actual copies can be done much later and it doesn't save 'iter'.
It gets itself in a right mess because it doesn't separate
the 'address I need to free' from 'the iov[] for any transfers'.

io_uring is also the only code that relies on import_iovec()
returning the iter.count on success.
It would be much better to have:
	iov = import_iovec(..., &cache, ...);
	free(iov);
and use ERR_PTR() et al for error detectoion.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-18 15:16         ` Christoph Hellwig
@ 2020-09-19 16:21           ` Andy Lutomirski
  2020-09-19 21:16             ` Arnd Bergmann
  2020-09-19 22:09           ` Al Viro
  1 sibling, 1 reply; 171+ messages in thread
From: Andy Lutomirski @ 2020-09-19 16:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, X86 ML, LKML, open list:MIPS, Parisc List,
	linuxppc-dev, linux-s390, sparclinux, linux-block,
	Linux SCSI List, Linux FS Devel, linux-aio, io-uring, linux-arch,
	Linux-MM, Network Development, keyrings, LSM List

On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> > Said that, why not provide a variant that would take an explicit
> > "is it compat" argument and use it there?  And have the normal
> > one pass in_compat_syscall() to that...
>
> That would help to not introduce a regression with this series yes.
> But it wouldn't fix existing bugs when io_uring is used to access
> read or write methods that use in_compat_syscall().  One example that
> I recently ran into is drivers/scsi/sg.c.

Aside from the potentially nasty use of per-task variables, one thing
I don't like about PF_FORCE_COMPAT is that it's one-way.  If we're
going to have a generic mechanism for this, shouldn't we allow a full
override of the syscall arch instead of just allowing forcing compat
so that a compat syscall can do a non-compat operation?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-19 16:21           ` Andy Lutomirski
@ 2020-09-19 21:16             ` Arnd Bergmann
  2020-09-19 21:52               ` Finn Thain
  2020-09-19 22:22               ` Andy Lutomirski
  0 siblings, 2 replies; 171+ messages in thread
From: Arnd Bergmann @ 2020-09-19 21:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Christoph Hellwig, Al Viro, Andrew Morton, Jens Axboe,
	David Howells, linux-arm-kernel, X86 ML, LKML, open list:MIPS,
	Parisc List, linuxppc-dev, linux-s390, sparclinux, linux-block,
	Linux SCSI List, Linux FS Devel, linux-aio, io-uring, linux-arch,
	Linux-MM, Network Development, keyrings, LSM List

On Sat, Sep 19, 2020 at 6:21 PM Andy Lutomirski <luto@kernel.org> wrote:
> On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig <hch@lst.de> wrote:
> > On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> > > Said that, why not provide a variant that would take an explicit
> > > "is it compat" argument and use it there?  And have the normal
> > > one pass in_compat_syscall() to that...
> >
> > That would help to not introduce a regression with this series yes.
> > But it wouldn't fix existing bugs when io_uring is used to access
> > read or write methods that use in_compat_syscall().  One example that
> > I recently ran into is drivers/scsi/sg.c.

Ah, so reading /dev/input/event* would suffer from the same issue,
and that one would in fact be broken by your patch in the hypothetical
case that someone tried to use io_uring to read /dev/input/event on x32...

For reference, I checked the socket timestamp handling that has a
number of corner cases with time32/time64 formats in compat mode,
but none of those appear to be affected by the problem.

> Aside from the potentially nasty use of per-task variables, one thing
> I don't like about PF_FORCE_COMPAT is that it's one-way.  If we're
> going to have a generic mechanism for this, shouldn't we allow a full
> override of the syscall arch instead of just allowing forcing compat
> so that a compat syscall can do a non-compat operation?

The only reason it's needed here is that the caller is in a kernel
thread rather than a system call. Are there any possible scenarios
where one would actually need the opposite?

       Arnd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-19 21:16             ` Arnd Bergmann
@ 2020-09-19 21:52               ` Finn Thain
  2020-09-19 22:22               ` Andy Lutomirski
  1 sibling, 0 replies; 171+ messages in thread
From: Finn Thain @ 2020-09-19 21:52 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Peter Maydell, Andy Lutomirski, Christoph Hellwig, Al Viro,
	Andrew Morton, Jens Axboe, David Howells, linux-arm-kernel,
	X86 ML, LKML, open list:MIPS, Parisc List, linuxppc-dev,
	linux-s390, sparclinux, linux-block, Linux SCSI List,
	Linux FS Devel, linux-aio, io-uring, linux-arch, Linux-MM,
	Network Development, keyrings, LSM List

On Sat, 19 Sep 2020, Arnd Bergmann wrote:

> On Sat, Sep 19, 2020 at 6:21 PM Andy Lutomirski <luto@kernel.org> wrote:
> > On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig <hch@lst.de> wrote:
> > > On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> > > > Said that, why not provide a variant that would take an explicit 
> > > > "is it compat" argument and use it there?  And have the normal one 
> > > > pass in_compat_syscall() to that...
> > >
> > > That would help to not introduce a regression with this series yes. 
> > > But it wouldn't fix existing bugs when io_uring is used to access 
> > > read or write methods that use in_compat_syscall().  One example 
> > > that I recently ran into is drivers/scsi/sg.c.
> 
> Ah, so reading /dev/input/event* would suffer from the same issue, and 
> that one would in fact be broken by your patch in the hypothetical case 
> that someone tried to use io_uring to read /dev/input/event on x32...
> 
> For reference, I checked the socket timestamp handling that has a number 
> of corner cases with time32/time64 formats in compat mode, but none of 
> those appear to be affected by the problem.
> 
> > Aside from the potentially nasty use of per-task variables, one thing 
> > I don't like about PF_FORCE_COMPAT is that it's one-way.  If we're 
> > going to have a generic mechanism for this, shouldn't we allow a full 
> > override of the syscall arch instead of just allowing forcing compat 
> > so that a compat syscall can do a non-compat operation?
> 
> The only reason it's needed here is that the caller is in a kernel 
> thread rather than a system call. Are there any possible scenarios where 
> one would actually need the opposite?
> 

Quite possibly. The ext4 vs. compat getdents bug is still unresolved. 
Please see,
https://lore.kernel.org/lkml/CAFEAcA9W+JK7_TrtTnL1P2ES1knNPJX9wcUvhfLwxLq9augq1w@mail.gmail.com/

>        Arnd
> 

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-18 15:16         ` Christoph Hellwig
  2020-09-19 16:21           ` Andy Lutomirski
@ 2020-09-19 22:09           ` Al Viro
  2020-09-19 22:23             ` Andy Lutomirski
  2020-09-20 13:55             ` Arnd Bergmann
  1 sibling, 2 replies; 171+ messages in thread
From: Al Viro @ 2020-09-19 22:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, x86, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

On Fri, Sep 18, 2020 at 05:16:15PM +0200, Christoph Hellwig wrote:
> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> > Said that, why not provide a variant that would take an explicit
> > "is it compat" argument and use it there?  And have the normal
> > one pass in_compat_syscall() to that...
> 
> That would help to not introduce a regression with this series yes.
> But it wouldn't fix existing bugs when io_uring is used to access
> read or write methods that use in_compat_syscall().  One example that
> I recently ran into is drivers/scsi/sg.c.

So screw such read/write methods - don't use them with io_uring.
That, BTW, is one of the reasons I'm sceptical about burying the
decisions deep into the callchain - we don't _want_ different
data layouts on read/write depending upon the 32bit vs. 64bit
caller, let alone the pointer-chasing garbage that is /dev/sg.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-19 21:16             ` Arnd Bergmann
  2020-09-19 21:52               ` Finn Thain
@ 2020-09-19 22:22               ` Andy Lutomirski
  2020-09-21 16:10                 ` Pavel Begunkov
  1 sibling, 1 reply; 171+ messages in thread
From: Andy Lutomirski @ 2020-09-19 22:22 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Andy Lutomirski, Christoph Hellwig, Al Viro, Andrew Morton,
	Jens Axboe, David Howells, linux-arm-kernel, X86 ML, LKML,
	open list:MIPS, Parisc List, linuxppc-dev, linux-s390,
	sparclinux, linux-block, Linux SCSI List, Linux FS Devel,
	linux-aio, io-uring, linux-arch, Linux-MM, Network Development,
	keyrings, LSM List


> On Sep 19, 2020, at 2:16 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> 
> ´╗┐On Sat, Sep 19, 2020 at 6:21 PM Andy Lutomirski <luto@kernel.org> wrote:
>>> On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig <hch@lst.de> wrote:
>>> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
>>>> Said that, why not provide a variant that would take an explicit
>>>> "is it compat" argument and use it there?  And have the normal
>>>> one pass in_compat_syscall() to that...
>>> 
>>> That would help to not introduce a regression with this series yes.
>>> But it wouldn't fix existing bugs when io_uring is used to access
>>> read or write methods that use in_compat_syscall().  One example that
>>> I recently ran into is drivers/scsi/sg.c.
> 
> Ah, so reading /dev/input/event* would suffer from the same issue,
> and that one would in fact be broken by your patch in the hypothetical
> case that someone tried to use io_uring to read /dev/input/event on x32...
> 
> For reference, I checked the socket timestamp handling that has a
> number of corner cases with time32/time64 formats in compat mode,
> but none of those appear to be affected by the problem.
> 
>> Aside from the potentially nasty use of per-task variables, one thing
>> I don't like about PF_FORCE_COMPAT is that it's one-way.  If we're
>> going to have a generic mechanism for this, shouldn't we allow a full
>> override of the syscall arch instead of just allowing forcing compat
>> so that a compat syscall can do a non-compat operation?
> 
> The only reason it's needed here is that the caller is in a kernel
> thread rather than a system call. Are there any possible scenarios
> where one would actually need the opposite?
> 

I can certainly imagine needing to force x32 mode from a kernel thread.

As for the other direction: what exactly are the desired bitness/arch semantics of io_uring?  Is the operation bitness chosen by the io_uring creation or by the io_uring_enter() bitness?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-19 22:09           ` Al Viro
@ 2020-09-19 22:23             ` Andy Lutomirski
  2020-09-19 22:41               ` Al Viro
  2020-09-20 13:55             ` Arnd Bergmann
  1 sibling, 1 reply; 171+ messages in thread
From: Andy Lutomirski @ 2020-09-19 22:23 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, x86, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module


> On Sep 19, 2020, at 3:09 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> 
> ´╗┐On Fri, Sep 18, 2020 at 05:16:15PM +0200, Christoph Hellwig wrote:
>>> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
>>> Said that, why not provide a variant that would take an explicit
>>> "is it compat" argument and use it there?  And have the normal
>>> one pass in_compat_syscall() to that...
>> 
>> That would help to not introduce a regression with this series yes.
>> But it wouldn't fix existing bugs when io_uring is used to access
>> read or write methods that use in_compat_syscall().  One example that
>> I recently ran into is drivers/scsi/sg.c.
> 
> So screw such read/write methods - don't use them with io_uring.
> That, BTW, is one of the reasons I'm sceptical about burying the
> decisions deep into the callchain - we don't _want_ different
> data layouts on read/write depending upon the 32bit vs. 64bit
> caller, let alone the pointer-chasing garbage that is /dev/sg.

Well, we could remove in_compat_syscall(), etc and instead have an implicit parameter in DEFINE_SYSCALL.  Then everything would have to be explicit.  This would probably be a win, although it could be quite a bit of work.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-19 22:23             ` Andy Lutomirski
@ 2020-09-19 22:41               ` Al Viro
  2020-09-19 22:53                 ` Andy Lutomirski
  0 siblings, 1 reply; 171+ messages in thread
From: Al Viro @ 2020-09-19 22:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, x86, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Sat, Sep 19, 2020 at 03:23:54PM -0700, Andy Lutomirski wrote:
> 
> > On Sep 19, 2020, at 3:09 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > 
> > ´╗┐On Fri, Sep 18, 2020 at 05:16:15PM +0200, Christoph Hellwig wrote:
> >>> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> >>> Said that, why not provide a variant that would take an explicit
> >>> "is it compat" argument and use it there?  And have the normal
> >>> one pass in_compat_syscall() to that...
> >> 
> >> That would help to not introduce a regression with this series yes.
> >> But it wouldn't fix existing bugs when io_uring is used to access
> >> read or write methods that use in_compat_syscall().  One example that
> >> I recently ran into is drivers/scsi/sg.c.
> > 
> > So screw such read/write methods - don't use them with io_uring.
> > That, BTW, is one of the reasons I'm sceptical about burying the
> > decisions deep into the callchain - we don't _want_ different
> > data layouts on read/write depending upon the 32bit vs. 64bit
> > caller, let alone the pointer-chasing garbage that is /dev/sg.
> 
> Well, we could remove in_compat_syscall(), etc and instead have an implicit parameter in DEFINE_SYSCALL.  Then everything would have to be explicit.  This would probably be a win, although it could be quite a bit of work.

It would not be a win - most of the syscalls don't give a damn
about 32bit vs. 64bit...

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-19 22:41               ` Al Viro
@ 2020-09-19 22:53                 ` Andy Lutomirski
  2020-09-19 23:24                   ` Al Viro
  0 siblings, 1 reply; 171+ messages in thread
From: Andy Lutomirski @ 2020-09-19 22:53 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, x86, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module



> On Sep 19, 2020, at 3:41 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> 
> On Sat, Sep 19, 2020 at 03:23:54PM -0700, Andy Lutomirski wrote:
>> 
>>>> On Sep 19, 2020, at 3:09 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>> 
>>> On Fri, Sep 18, 2020 at 05:16:15PM +0200, Christoph Hellwig wrote:
>>>>> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
>>>>> Said that, why not provide a variant that would take an explicit
>>>>> "is it compat" argument and use it there?  And have the normal
>>>>> one pass in_compat_syscall() to that...
>>>> 
>>>> That would help to not introduce a regression with this series yes.
>>>> But it wouldn't fix existing bugs when io_uring is used to access
>>>> read or write methods that use in_compat_syscall().  One example that
>>>> I recently ran into is drivers/scsi/sg.c.
>>> 
>>> So screw such read/write methods - don't use them with io_uring.
>>> That, BTW, is one of the reasons I'm sceptical about burying the
>>> decisions deep into the callchain - we don't _want_ different
>>> data layouts on read/write depending upon the 32bit vs. 64bit
>>> caller, let alone the pointer-chasing garbage that is /dev/sg.
>> 
>> Well, we could remove in_compat_syscall(), etc and instead have an implicit parameter in DEFINE_SYSCALL.  Then everything would have to be explicit.  This would probably be a win, although it could be quite a bit of work.
> 
> It would not be a win - most of the syscalls don't give a damn
> about 32bit vs. 64bit...

Any reasonable implementation would optimize it out for syscalls that don’t care.  Or it could be explicit:

DEFINE_MULTIARCH_SYSCALL(...)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-19 22:53                 ` Andy Lutomirski
@ 2020-09-19 23:24                   ` Al Viro
  2020-09-20  0:14                     ` Andy Lutomirski
  0 siblings, 1 reply; 171+ messages in thread
From: Al Viro @ 2020-09-19 23:24 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, x86, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Sat, Sep 19, 2020 at 03:53:40PM -0700, Andy Lutomirski wrote:

> > It would not be a win - most of the syscalls don't give a damn
> > about 32bit vs. 64bit...
> 
> Any reasonable implementation would optimize it out for syscalls that don’t care.  Or it could be explicit:
> 
> DEFINE_MULTIARCH_SYSCALL(...)

1) what would that look like?
2) have you counted the syscalls that do and do not need that?
3) how many of those realistically *can* be unified with their
compat counterparts?  [hint: ioctl(2) cannot]

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-19 23:24                   ` Al Viro
@ 2020-09-20  0:14                     ` Andy Lutomirski
  2020-09-20  2:57                       ` Al Viro
  0 siblings, 1 reply; 171+ messages in thread
From: Andy Lutomirski @ 2020-09-20  0:14 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, X86 ML, LKML, open list:MIPS,
	Parisc List, linuxppc-dev, linux-s390, sparclinux, linux-block,
	Linux SCSI List, Linux FS Devel, linux-aio, io-uring, linux-arch,
	Linux-MM, Network Development, keyrings, LSM List

On Sat, Sep 19, 2020 at 4:24 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sat, Sep 19, 2020 at 03:53:40PM -0700, Andy Lutomirski wrote:
>
> > > It would not be a win - most of the syscalls don't give a damn
> > > about 32bit vs. 64bit...
> >
> > Any reasonable implementation would optimize it out for syscalls that don’t care.  Or it could be explicit:
> >
> > DEFINE_MULTIARCH_SYSCALL(...)
>
> 1) what would that look like?

In effect, it would work like this:

/* Arch-specific, but there's a generic case for sane architectures. */
enum syscall_arch {
  SYSCALL_NATIVE,
  SYSCALL_COMPAT,
  SYSCALL_X32,
};

DEFINE_MULTIARCH_SYSCALLn(args, arch)
{
  args are the args here, and arch is the arch.
}

> 2) have you counted the syscalls that do and do not need that?

No.

> 3) how many of those realistically *can* be unified with their
> compat counterparts?  [hint: ioctl(2) cannot]

There would be no requirement to unify anything.  The idea is that
we'd get rid of all the global state flags.

For ioctl, we'd have a new file_operation:

long ioctl(struct file *, unsigned int, unsigned long, enum syscall_arch);

I'm not saying this is easy, but I think it's possible and the result
would be more obviously correct than what we have now.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20  0:14                     ` Andy Lutomirski
@ 2020-09-20  2:57                       ` Al Viro
  2020-09-20 16:59                         ` Andy Lutomirski
  0 siblings, 1 reply; 171+ messages in thread
From: Al Viro @ 2020-09-20  2:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, X86 ML, LKML, open list:MIPS,
	Parisc List, linuxppc-dev, linux-s390, sparclinux, linux-block,
	Linux SCSI List, Linux FS Devel, linux-aio, io-uring, linux-arch,
	Linux-MM, Network Development, keyrings, LSM List

On Sat, Sep 19, 2020 at 05:14:41PM -0700, Andy Lutomirski wrote:

> > 2) have you counted the syscalls that do and do not need that?
> 
> No.

Might be illuminating...

> > 3) how many of those realistically *can* be unified with their
> > compat counterparts?  [hint: ioctl(2) cannot]
> 
> There would be no requirement to unify anything.  The idea is that
> we'd get rid of all the global state flags.

_What_ global state flags?  When you have separate SYSCALL_DEFINE3(ioctl...)
and COMPAT_SYSCALL_DEFINE3(ioctl...), there's no flags at all, global or
local.  They only come into the play when you try to share the same function
for both, right on the top level.

> For ioctl, we'd have a new file_operation:
> 
> long ioctl(struct file *, unsigned int, unsigned long, enum syscall_arch);
> 
> I'm not saying this is easy, but I think it's possible and the result
> would be more obviously correct than what we have now.

No, it would not.  Seriously, from time to time a bit of RTFS before grand
proposals turns out to be useful.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-19 22:09           ` Al Viro
  2020-09-19 22:23             ` Andy Lutomirski
@ 2020-09-20 13:55             ` Arnd Bergmann
  2020-09-20 15:02               ` Al Viro
  1 sibling, 1 reply; 171+ messages in thread
From: Arnd Bergmann @ 2020-09-20 13:55 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, David Howells,
	Linux ARM, the arch/x86 maintainers, linux-kernel,
	open list:BROADCOM NVRAM DRIVER, Parisc List, linuxppc-dev,
	linux-s390, sparclinux, linux-block, linux-scsi,
	Linux FS-devel Mailing List, linux-aio, io-uring, linux-arch,
	Linux-MM, Networking, keyrings, LSM List

On Sun, Sep 20, 2020 at 12:09 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Sep 18, 2020 at 05:16:15PM +0200, Christoph Hellwig wrote:
> > On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> > > Said that, why not provide a variant that would take an explicit
> > > "is it compat" argument and use it there?  And have the normal
> > > one pass in_compat_syscall() to that...
> >
> > That would help to not introduce a regression with this series yes.
> > But it wouldn't fix existing bugs when io_uring is used to access
> > read or write methods that use in_compat_syscall().  One example that
> > I recently ran into is drivers/scsi/sg.c.
>
> So screw such read/write methods - don't use them with io_uring.
> That, BTW, is one of the reasons I'm sceptical about burying the
> decisions deep into the callchain - we don't _want_ different
> data layouts on read/write depending upon the 32bit vs. 64bit
> caller, let alone the pointer-chasing garbage that is /dev/sg.

Would it be too late to limit what kind of file descriptors we allow
io_uring to read/write on?

If io_uring can get changed to return -EINVAL on trying to
read/write something other than S_IFREG file descriptors,
that particular problem space gets a lot simpler, but this
is of course only possible if nobody actually relies on it yet.

      Arnd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20 13:55             ` Arnd Bergmann
@ 2020-09-20 15:02               ` Al Viro
  0 siblings, 0 replies; 171+ messages in thread
From: Al Viro @ 2020-09-20 15:02 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, David Howells,
	Linux ARM, the arch/x86 maintainers, linux-kernel,
	open list:BROADCOM NVRAM DRIVER, Parisc List, linuxppc-dev,
	linux-s390, sparclinux, linux-block, linux-scsi,
	Linux FS-devel Mailing List, linux-aio, io-uring, linux-arch,
	Linux-MM, Networking, keyrings, LSM List

On Sun, Sep 20, 2020 at 03:55:47PM +0200, Arnd Bergmann wrote:
> On Sun, Sep 20, 2020 at 12:09 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Fri, Sep 18, 2020 at 05:16:15PM +0200, Christoph Hellwig wrote:
> > > On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> > > > Said that, why not provide a variant that would take an explicit
> > > > "is it compat" argument and use it there?  And have the normal
> > > > one pass in_compat_syscall() to that...
> > >
> > > That would help to not introduce a regression with this series yes.
> > > But it wouldn't fix existing bugs when io_uring is used to access
> > > read or write methods that use in_compat_syscall().  One example that
> > > I recently ran into is drivers/scsi/sg.c.
> >
> > So screw such read/write methods - don't use them with io_uring.
> > That, BTW, is one of the reasons I'm sceptical about burying the
> > decisions deep into the callchain - we don't _want_ different
> > data layouts on read/write depending upon the 32bit vs. 64bit
> > caller, let alone the pointer-chasing garbage that is /dev/sg.
> 
> Would it be too late to limit what kind of file descriptors we allow
> io_uring to read/write on?
> 
> If io_uring can get changed to return -EINVAL on trying to
> read/write something other than S_IFREG file descriptors,
> that particular problem space gets a lot simpler, but this
> is of course only possible if nobody actually relies on it yet.

S_IFREG is almost certainly too heavy as a restriction.  Looking through
the stuff sensitive to 32bit/64bit, we seem to have
	* /dev/sg - pointer-chasing horror
	* sysfs files for efivar - different layouts for compat and native,
shitty userland ABI design (
struct efi_variable {
        efi_char16_t  VariableName[EFI_VAR_NAME_LEN/sizeof(efi_char16_t)];
        efi_guid_t    VendorGuid;
        unsigned long DataSize;
        __u8          Data[1024];
        efi_status_t  Status;
        __u32         Attributes;
} __attribute__((packed));
) is the piece of crap in question; 'DataSize' is where the headache comes
from.  Regular files, BTW...
	* uhid - character device, milder pointer-chasing horror.  Trouble
comes from this:
/* Obsolete! Use UHID_CREATE2. */
struct uhid_create_req {
        __u8 name[128];
        __u8 phys[64];
        __u8 uniq[64];
        __u8 __user *rd_data;
        __u16 rd_size;

        __u16 bus;
        __u32 vendor;
        __u32 product;
        __u32 version;
        __u32 country;
} __attribute__((__packed__));
and suggested replacement doesn't do any pointer-chasing (rd_data is an
embedded array in the end of struct uhid_create2_req).
	* evdev, uinput - bitness-sensitive layout, due to timestamps
	* /proc/bus/input/devices - weird crap with printing bitmap, different
_text_ layouts seen by 32bit and 64bit readers.  Binary structures are PITA,
but with sufficient effort you can screw the text just as hard...  Oh, and it's
a regular file.
	* similar in sysfs analogue

And AFAICS, that's it for read/write-related method instances.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-18 12:45 ` [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag Christoph Hellwig
  2020-09-18 13:40   ` Al Viro
@ 2020-09-20 15:15   ` Matthew Wilcox
  2020-09-20 15:55     ` William Kucharski
                       ` (2 more replies)
  1 sibling, 3 replies; 171+ messages in thread
From: Matthew Wilcox @ 2020-09-20 15:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alexander Viro, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, x86, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Fri, Sep 18, 2020 at 02:45:25PM +0200, Christoph Hellwig wrote:
> Add a flag to force processing a syscall as a compat syscall.  This is
> required so that in_compat_syscall() works for I/O submitted by io_uring
> helper threads on behalf of compat syscalls.

Al doesn't like this much, but my suggestion is to introduce two new
opcodes -- IORING_OP_READV32 and IORING_OP_WRITEV32.  The compat code
can translate IORING_OP_READV to IORING_OP_READV32 and then the core
code can know what that user pointer is pointing to.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20 15:15   ` Matthew Wilcox
@ 2020-09-20 15:55     ` William Kucharski
  2020-09-21 16:20       ` Pavel Begunkov
  2020-09-20 16:00     ` Arnd Bergmann
  2020-09-20 18:07     ` Al Viro
  2 siblings, 1 reply; 171+ messages in thread
From: William Kucharski @ 2020-09-20 15:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Alexander Viro, Andrew Morton, Jens Axboe,
	Arnd Bergmann, David Howells, linux-arm-kernel, x86,
	linux-kernel, linux-mips, linux-parisc, linuxppc-dev, linux-s390,
	sparclinux, linux-block, linux-scsi, linux-fsdevel, linux-aio,
	io-uring, linux-arch, linux-mm, netdev, keyrings,
	linux-security-module

I really like that as it’s self-documenting and anyone debugging it can see what is actually being used at a glance.

> On Sep 20, 2020, at 09:15, Matthew Wilcox <willy@infradead.org> wrote:
> 
> On Fri, Sep 18, 2020 at 02:45:25PM +0200, Christoph Hellwig wrote:
>> Add a flag to force processing a syscall as a compat syscall.  This is
>> required so that in_compat_syscall() works for I/O submitted by io_uring
>> helper threads on behalf of compat syscalls.
> 
> Al doesn't like this much, but my suggestion is to introduce two new
> opcodes -- IORING_OP_READV32 and IORING_OP_WRITEV32.  The compat code
> can translate IORING_OP_READV to IORING_OP_READV32 and then the core
> code can know what that user pointer is pointing to.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20 15:15   ` Matthew Wilcox
  2020-09-20 15:55     ` William Kucharski
@ 2020-09-20 16:00     ` Arnd Bergmann
  2020-09-20 18:07     ` Al Viro
  2 siblings, 0 replies; 171+ messages in thread
From: Arnd Bergmann @ 2020-09-20 16:00 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Alexander Viro, Andrew Morton, Jens Axboe,
	David Howells, Linux ARM, the arch/x86 maintainers, linux-kernel,
	open list:BROADCOM NVRAM DRIVER, Parisc List, linuxppc-dev,
	linux-s390, sparclinux, linux-block, linux-scsi,
	Linux FS-devel Mailing List, linux-aio, io-uring, linux-arch,
	Linux-MM, Networking, keyrings, LSM List

On Sun, Sep 20, 2020 at 5:15 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Sep 18, 2020 at 02:45:25PM +0200, Christoph Hellwig wrote:
> > Add a flag to force processing a syscall as a compat syscall.  This is
> > required so that in_compat_syscall() works for I/O submitted by io_uring
> > helper threads on behalf of compat syscalls.
>
> Al doesn't like this much, but my suggestion is to introduce two new
> opcodes -- IORING_OP_READV32 and IORING_OP_WRITEV32.  The compat code
> can translate IORING_OP_READV to IORING_OP_READV32 and then the core
> code can know what that user pointer is pointing to.

How is that different from the current approach of storing the ABI as
a flag in ctx->compat?

     Arnd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20  2:57                       ` Al Viro
@ 2020-09-20 16:59                         ` Andy Lutomirski
  2020-09-20 18:12                           ` Al Viro
  0 siblings, 1 reply; 171+ messages in thread
From: Andy Lutomirski @ 2020-09-20 16:59 UTC (permalink / raw)
  To: Al Viro
  Cc: Andy Lutomirski, Christoph Hellwig, Andrew Morton, Jens Axboe,
	Arnd Bergmann, David Howells, linux-arm-kernel, X86 ML, LKML,
	open list:MIPS, Parisc List, linuxppc-dev, linux-s390,
	sparclinux, linux-block, Linux SCSI List, Linux FS Devel,
	linux-aio, io-uring, linux-arch, Linux-MM, Network Development,
	keyrings, LSM List

On Sat, Sep 19, 2020 at 7:57 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sat, Sep 19, 2020 at 05:14:41PM -0700, Andy Lutomirski wrote:
>
> > > 2) have you counted the syscalls that do and do not need that?
> >
> > No.
>
> Might be illuminating...
>
> > > 3) how many of those realistically *can* be unified with their
> > > compat counterparts?  [hint: ioctl(2) cannot]
> >
> > There would be no requirement to unify anything.  The idea is that
> > we'd get rid of all the global state flags.
>
> _What_ global state flags?  When you have separate SYSCALL_DEFINE3(ioctl...)
> and COMPAT_SYSCALL_DEFINE3(ioctl...), there's no flags at all, global or
> local.  They only come into the play when you try to share the same function
> for both, right on the top level.

...

>
> > For ioctl, we'd have a new file_operation:
> >
> > long ioctl(struct file *, unsigned int, unsigned long, enum syscall_arch);
> >
> > I'm not saying this is easy, but I think it's possible and the result
> > would be more obviously correct than what we have now.
>
> No, it would not.  Seriously, from time to time a bit of RTFS before grand
> proposals turns out to be useful.

As one example, look at __sys_setsockopt().  It's called for the
native and compat versions, and it contains an in_compat_syscall()
check.  (This particularly check looks dubious to me, but that's
another story.)  If this were to be done with equivalent semantics
without a separate COMPAT_DEFINE_SYSCALL and without
in_compat_syscall(), there would need to be some indication as to
whether this is compat or native setsockopt.  There are other
setsockopt implementations in the net stack with more
legitimate-seeming uses of in_compat_syscall() that would need some
other mechanism if in_compat_syscall() were to go away.

setsockopt is (I hope!) out of scope for io_uring, but the situation
isn't fundamentally different from read and write.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20 15:15   ` Matthew Wilcox
  2020-09-20 15:55     ` William Kucharski
  2020-09-20 16:00     ` Arnd Bergmann
@ 2020-09-20 18:07     ` Al Viro
  2020-09-20 18:41       ` Al Viro
                         ` (2 more replies)
  2 siblings, 3 replies; 171+ messages in thread
From: Al Viro @ 2020-09-20 18:07 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, x86, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Sun, Sep 20, 2020 at 04:15:10PM +0100, Matthew Wilcox wrote:
> On Fri, Sep 18, 2020 at 02:45:25PM +0200, Christoph Hellwig wrote:
> > Add a flag to force processing a syscall as a compat syscall.  This is
> > required so that in_compat_syscall() works for I/O submitted by io_uring
> > helper threads on behalf of compat syscalls.
> 
> Al doesn't like this much, but my suggestion is to introduce two new
> opcodes -- IORING_OP_READV32 and IORING_OP_WRITEV32.  The compat code
> can translate IORING_OP_READV to IORING_OP_READV32 and then the core
> code can know what that user pointer is pointing to.

Let's separate two issues:
	1) compat syscalls want 32bit iovecs.  Nothing to do with the
drivers, dealt with just fine.
	2) a few drivers are really fucked in head.  They use different
*DATA* layouts for reads/writes, depending upon the calling process.
IOW, if you fork/exec a 32bit binary and your stdin is one of those,
reads from stdin in parent and child will yield different data layouts.
On the same struct file.
	That's what Christoph worries about (/dev/sg he'd mentioned is
one of those).

	IMO we should simply have that dozen or so of pathological files
marked with FMODE_SHITTY_ABI; it's not about how they'd been opened -
it describes the userland ABI provided by those.  And it's cast in stone.

	Any in_compat_syscall() in ->read()/->write() instances is an ABI
bug, plain and simple.  Some are unfixable for compatibility reasons, but
any new caller like that should be a big red flag.

	How we import iovec array is none of the drivers' concern; we do
not need to mess with in_compat_syscall() reporting the matching value,
etc. for that.  It's about the instances that want in_compat_syscall() to
decide between the 32bit and 64bit data layouts.  And I believe that
we should simply have them marked as such and rejected by io_uring.  With
any new occurences getting slapped down hard.

	Current list of those turds:
/dev/sg (pointer-chasing, generally insane)
/sys/firmware/efi/vars/*/raw_var (fucked binary structure)
/sys/firmware/efi/vars/new_var (fucked binary structure)
/sys/firmware/efi/vars/del_var (fucked binary structure)
/dev/uhid	(pointer-chasing for one obsolete command)
/dev/input/event* (timestamps)
/dev/uinput (timestamps)
/proc/bus/input/devices (fucked bitmap-to-text representation)
/sys/class/input/*/capabilities/* (fucked bitmap-to-text representation)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20 16:59                         ` Andy Lutomirski
@ 2020-09-20 18:12                           ` Al Viro
  0 siblings, 0 replies; 171+ messages in thread
From: Al Viro @ 2020-09-20 18:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, X86 ML, LKML, open list:MIPS,
	Parisc List, linuxppc-dev, linux-s390, sparclinux, linux-block,
	Linux SCSI List, Linux FS Devel, linux-aio, io-uring, linux-arch,
	Linux-MM, Network Development, keyrings, LSM List

On Sun, Sep 20, 2020 at 09:59:36AM -0700, Andy Lutomirski wrote:

> As one example, look at __sys_setsockopt().  It's called for the
> native and compat versions, and it contains an in_compat_syscall()
> check.  (This particularly check looks dubious to me, but that's
> another story.)  If this were to be done with equivalent semantics
> without a separate COMPAT_DEFINE_SYSCALL and without
> in_compat_syscall(), there would need to be some indication as to
> whether this is compat or native setsockopt.  There are other
> setsockopt implementations in the net stack with more
> legitimate-seeming uses of in_compat_syscall() that would need some
> other mechanism if in_compat_syscall() were to go away.
> 
> setsockopt is (I hope!) out of scope for io_uring, but the situation
> isn't fundamentally different from read and write.

	Except that setsockopt() had that crap very widespread; for read()
and write() those are very rare exceptions.

	Andy, please RTFS.  Or dig through archives.  The situation
with setsockopt() is *NOT* a good thing - it's (probably) the least
of the evils.  The last thing we need is making that the norm.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20 18:07     ` Al Viro
@ 2020-09-20 18:41       ` Al Viro
  2020-09-20 19:01       ` Matthew Wilcox
  2020-09-20 19:14       ` Andy Lutomirski
  2 siblings, 0 replies; 171+ messages in thread
From: Al Viro @ 2020-09-20 18:41 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, x86, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Sun, Sep 20, 2020 at 07:07:42PM +0100, Al Viro wrote:

> /proc/bus/input/devices (fucked bitmap-to-text representation)

To illustrate the, er, beauty of that stuff:

; cat32 /proc/bus/input/devices >/tmp/a
; cat /proc/bus/input/devices >/tmp/b
; diff -u /tmp/a /tmp/b|grep '^[-+]'
--- /tmp/a      2020-09-20 14:28:43.442560691 -0400
+++ /tmp/b      2020-09-20 14:28:49.018543230 -0400
-B: KEY\x1100f 2902000 8380307c f910f001 feffffdf ffefffff ffffffff fffffffe
+B: KEY\x1100f02902000 8380307cf910f001 feffffdfffefffff fffffffffffffffe
-B: KEYp000 0 0 0 0 0 0 0 0
+B: KEYp000 0 0 0 0
-B: KEYB0 0 70000 0 0 0 0 0 0 0 0
+B: KEYB0 70000 0 0 0 0
-B: KEY\x100000 0 0 0
+B: KEY\x10000000000000 0
-B: KEY@00 0 0 0 0
+B: KEY@00 0 0
-B: KEY€00 0 0 0 0 0 1100b 800 2 200000 0 0 0 0
+B: KEY€0000000000 0 0 1100b00000800 200200000 0 0
-B: KEY>000b 0 0 0 0 0 0 0
+B: KEY>000b00000000 0 0 0
-B: KEY˙ffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff fffffffe
+B: KEY˙ffffffffffffff ffffffffffffffff ffffffffffffffff fffffffffffffffe
; 
(cat32 being a 32bit binary of cat)
All the differences are due to homegrown bitmap-to-text conversion.

Note that feeding that into a pipe leaves the recepient with a lovely problem -
you can't go by the width of words (they are not zero-padded) and you can't
go by the number of words either - it varies from device to device.

And there's nothing we can do with that - it's a userland ABI, can't be
changed without breaking stuff.  I would prefer to avoid additional examples
like that, TYVM...

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20 18:07     ` Al Viro
  2020-09-20 18:41       ` Al Viro
@ 2020-09-20 19:01       ` Matthew Wilcox
  2020-09-20 19:10         ` Al Viro
  2020-09-20 19:14       ` Andy Lutomirski
  2 siblings, 1 reply; 171+ messages in thread
From: Matthew Wilcox @ 2020-09-20 19:01 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, x86, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Sun, Sep 20, 2020 at 07:07:42PM +0100, Al Viro wrote:
> 	2) a few drivers are really fucked in head.  They use different
> *DATA* layouts for reads/writes, depending upon the calling process.
> IOW, if you fork/exec a 32bit binary and your stdin is one of those,
> reads from stdin in parent and child will yield different data layouts.
> On the same struct file.
> 	That's what Christoph worries about (/dev/sg he'd mentioned is
> one of those).
> 
> 	IMO we should simply have that dozen or so of pathological files
> marked with FMODE_SHITTY_ABI; it's not about how they'd been opened -
> it describes the userland ABI provided by those.  And it's cast in stone.
> 
> 	Any in_compat_syscall() in ->read()/->write() instances is an ABI
> bug, plain and simple.  Some are unfixable for compatibility reasons, but
> any new caller like that should be a big red flag.

So an IOCB_COMPAT flag would let us know whether the caller is expecting
a 32-bit or 64-bit layout?  And io_uring could set it based on the
ctx->compat flag.

> 	Current list of those turds:
> /dev/sg (pointer-chasing, generally insane)
> /sys/firmware/efi/vars/*/raw_var (fucked binary structure)
> /sys/firmware/efi/vars/new_var (fucked binary structure)
> /sys/firmware/efi/vars/del_var (fucked binary structure)
> /dev/uhid	(pointer-chasing for one obsolete command)
> /dev/input/event* (timestamps)
> /dev/uinput (timestamps)
> /proc/bus/input/devices (fucked bitmap-to-text representation)
> /sys/class/input/*/capabilities/* (fucked bitmap-to-text representation)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20 19:01       ` Matthew Wilcox
@ 2020-09-20 19:10         ` Al Viro
  2020-09-20 19:22           ` Matthew Wilcox
  0 siblings, 1 reply; 171+ messages in thread
From: Al Viro @ 2020-09-20 19:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, x86, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Sun, Sep 20, 2020 at 08:01:59PM +0100, Matthew Wilcox wrote:
> On Sun, Sep 20, 2020 at 07:07:42PM +0100, Al Viro wrote:
> > 	2) a few drivers are really fucked in head.  They use different
> > *DATA* layouts for reads/writes, depending upon the calling process.
> > IOW, if you fork/exec a 32bit binary and your stdin is one of those,
> > reads from stdin in parent and child will yield different data layouts.
> > On the same struct file.
> > 	That's what Christoph worries about (/dev/sg he'd mentioned is
> > one of those).
> > 
> > 	IMO we should simply have that dozen or so of pathological files
> > marked with FMODE_SHITTY_ABI; it's not about how they'd been opened -
> > it describes the userland ABI provided by those.  And it's cast in stone.
> > 
> > 	Any in_compat_syscall() in ->read()/->write() instances is an ABI
> > bug, plain and simple.  Some are unfixable for compatibility reasons, but
> > any new caller like that should be a big red flag.
> 
> So an IOCB_COMPAT flag would let us know whether the caller is expecting
> a 32-bit or 64-bit layout?  And io_uring could set it based on the
> ctx->compat flag.

So you want to convert all that crap to a form that would see iocb
(IOW, ->read_iter()/->write_iter())?  And, since quite a few are sysfs
attributes, propagate that through sysfs, changing the method signatures
to match that and either modifying fuckloads of instances or introducing
new special methods for the ones that are sensitive to that crap?

IMO it's much saner to mark those and refuse to touch them from io_uring...

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20 18:07     ` Al Viro
  2020-09-20 18:41       ` Al Viro
  2020-09-20 19:01       ` Matthew Wilcox
@ 2020-09-20 19:14       ` Andy Lutomirski
  2020-09-21  4:28         ` Christoph Hellwig
  2 siblings, 1 reply; 171+ messages in thread
From: Andy Lutomirski @ 2020-09-20 19:14 UTC (permalink / raw)
  To: Al Viro
  Cc: Matthew Wilcox, Christoph Hellwig, Andrew Morton, Jens Axboe,
	Arnd Bergmann, David Howells, linux-arm-kernel, X86 ML, LKML,
	open list:MIPS, Parisc List, linuxppc-dev, linux-s390,
	sparclinux, linux-block, Linux SCSI List, Linux FS Devel,
	linux-aio, io-uring, linux-arch, Linux-MM, Network Development,
	keyrings, LSM List

On Sun, Sep 20, 2020 at 11:07 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sun, Sep 20, 2020 at 04:15:10PM +0100, Matthew Wilcox wrote:
> > On Fri, Sep 18, 2020 at 02:45:25PM +0200, Christoph Hellwig wrote:
> > > Add a flag to force processing a syscall as a compat syscall.  This is
> > > required so that in_compat_syscall() works for I/O submitted by io_uring
> > > helper threads on behalf of compat syscalls.
> >
> > Al doesn't like this much, but my suggestion is to introduce two new
> > opcodes -- IORING_OP_READV32 and IORING_OP_WRITEV32.  The compat code
> > can translate IORING_OP_READV to IORING_OP_READV32 and then the core
> > code can know what that user pointer is pointing to.
>
> Let's separate two issues:
>         1) compat syscalls want 32bit iovecs.  Nothing to do with the
> drivers, dealt with just fine.
>         2) a few drivers are really fucked in head.  They use different
> *DATA* layouts for reads/writes, depending upon the calling process.
> IOW, if you fork/exec a 32bit binary and your stdin is one of those,
> reads from stdin in parent and child will yield different data layouts.
> On the same struct file.
>         That's what Christoph worries about (/dev/sg he'd mentioned is
> one of those).
>
>         IMO we should simply have that dozen or so of pathological files
> marked with FMODE_SHITTY_ABI; it's not about how they'd been opened -
> it describes the userland ABI provided by those.  And it's cast in stone.
>

I wonder if this is really quite cast in stone.  We could also have
FMODE_SHITTY_COMPAT and set that when a file like this is *opened* in
compat mode.  Then that particular struct file would be read and
written using the compat data format.  The change would be
user-visible, but the user that would see it would be very strange
indeed.

I don't have a strong opinion as to whether that is better or worse
than denying io_uring access to these things, but at least it moves
the special case out of io_uring.

--Andy

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20 19:10         ` Al Viro
@ 2020-09-20 19:22           ` Matthew Wilcox
  2020-09-20 19:28             ` Andy Lutomirski
                               ` (2 more replies)
  0 siblings, 3 replies; 171+ messages in thread
From: Matthew Wilcox @ 2020-09-20 19:22 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, x86, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote:
> IMO it's much saner to mark those and refuse to touch them from io_uring...

Simpler solution is to remove io_uring from the 32-bit syscall list.
If you're a 32-bit process, you don't get to use io_uring.  Would
any real users actually care about that?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20 19:22           ` Matthew Wilcox
@ 2020-09-20 19:28             ` Andy Lutomirski
  2020-09-20 20:49               ` Arnd Bergmann
  2020-09-20 21:42             ` Al Viro
  2020-09-21 16:26             ` Pavel Begunkov
  2 siblings, 1 reply; 171+ messages in thread
From: Andy Lutomirski @ 2020-09-20 19:28 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Al Viro, Christoph Hellwig, Andrew Morton, Jens Axboe,
	Arnd Bergmann, David Howells, linux-arm-kernel, X86 ML, LKML,
	open list:MIPS, Parisc List, linuxppc-dev, linux-s390,
	sparclinux, linux-block, Linux SCSI List, Linux FS Devel,
	linux-aio, io-uring, linux-arch, Linux-MM, Network Development,
	keyrings, LSM List

On Sun, Sep 20, 2020 at 12:23 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote:
> > IMO it's much saner to mark those and refuse to touch them from io_uring...
>
> Simpler solution is to remove io_uring from the 32-bit syscall list.
> If you're a 32-bit process, you don't get to use io_uring.  Would
> any real users actually care about that?

We could go one step farther and declare that we're done adding *any*
new compat syscalls :)



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20 19:28             ` Andy Lutomirski
@ 2020-09-20 20:49               ` Arnd Bergmann
  2020-09-20 21:13                 ` David Laight
  0 siblings, 1 reply; 171+ messages in thread
From: Arnd Bergmann @ 2020-09-20 20:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Matthew Wilcox, Al Viro, Christoph Hellwig, Andrew Morton,
	Jens Axboe, David Howells, linux-arm-kernel, X86 ML, LKML,
	open list:MIPS, Parisc List, linuxppc-dev, linux-s390,
	sparclinux, linux-block, Linux SCSI List, Linux FS Devel,
	linux-aio, io-uring, linux-arch, Linux-MM, Network Development,
	keyrings, LSM List

On Sun, Sep 20, 2020 at 9:28 PM Andy Lutomirski <luto@kernel.org> wrote:
> On Sun, Sep 20, 2020 at 12:23 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote:
> > > IMO it's much saner to mark those and refuse to touch them from io_uring...
> >
> > Simpler solution is to remove io_uring from the 32-bit syscall list.
> > If you're a 32-bit process, you don't get to use io_uring.  Would
> > any real users actually care about that?
>
> We could go one step farther and declare that we're done adding *any*
> new compat syscalls :)

Would you also stop adding system calls to native 32-bit systems then?

On memory constrained systems (less than 2GB a.t.m.), there is still a
strong demand for running 32-bit user space, but all of the recent Arm
cores (after Cortex-A55) dropped the ability to run 32-bit kernels, so
that compat mode may eventually become the primary way to run
Linux on cheap embedded systems.

I don't think there is any chance we can realistically take away io_uring
from the 32-bit ABI any more now.

      Arnd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20 20:49               ` Arnd Bergmann
@ 2020-09-20 21:13                 ` David Laight
  2020-09-21 16:31                   ` Pavel Begunkov
  0 siblings, 1 reply; 171+ messages in thread
From: David Laight @ 2020-09-20 21:13 UTC (permalink / raw)
  To: 'Arnd Bergmann', Andy Lutomirski
  Cc: Matthew Wilcox, Al Viro, Christoph Hellwig, Andrew Morton,
	Jens Axboe, David Howells, linux-arm-kernel, X86 ML, LKML,
	open list:MIPS, Parisc List, linuxppc-dev, linux-s390,
	sparclinux, linux-block, Linux SCSI List, Linux FS Devel,
	linux-aio, io-uring, linux-arch, Linux-MM, Network Development,
	keyrings, LSM List

RnJvbTogQXJuZCBCZXJnbWFubg0KPiBTZW50OiAyMCBTZXB0ZW1iZXIgMjAyMCAyMTo0OQ0KPiAN
Cj4gT24gU3VuLCBTZXAgMjAsIDIwMjAgYXQgOToyOCBQTSBBbmR5IEx1dG9taXJza2kgPGx1dG9A
a2VybmVsLm9yZz4gd3JvdGU6DQo+ID4gT24gU3VuLCBTZXAgMjAsIDIwMjAgYXQgMTI6MjMgUE0g
TWF0dGhldyBXaWxjb3ggPHdpbGx5QGluZnJhZGVhZC5vcmc+IHdyb3RlOg0KPiA+ID4NCj4gPiA+
IE9uIFN1biwgU2VwIDIwLCAyMDIwIGF0IDA4OjEwOjMxUE0gKzAxMDAsIEFsIFZpcm8gd3JvdGU6
DQo+ID4gPiA+IElNTyBpdCdzIG11Y2ggc2FuZXIgdG8gbWFyayB0aG9zZSBhbmQgcmVmdXNlIHRv
IHRvdWNoIHRoZW0gZnJvbSBpb191cmluZy4uLg0KPiA+ID4NCj4gPiA+IFNpbXBsZXIgc29sdXRp
b24gaXMgdG8gcmVtb3ZlIGlvX3VyaW5nIGZyb20gdGhlIDMyLWJpdCBzeXNjYWxsIGxpc3QuDQo+
ID4gPiBJZiB5b3UncmUgYSAzMi1iaXQgcHJvY2VzcywgeW91IGRvbid0IGdldCB0byB1c2UgaW9f
dXJpbmcuICBXb3VsZA0KPiA+ID4gYW55IHJlYWwgdXNlcnMgYWN0dWFsbHkgY2FyZSBhYm91dCB0
aGF0Pw0KPiA+DQo+ID4gV2UgY291bGQgZ28gb25lIHN0ZXAgZmFydGhlciBhbmQgZGVjbGFyZSB0
aGF0IHdlJ3JlIGRvbmUgYWRkaW5nICphbnkqDQo+ID4gbmV3IGNvbXBhdCBzeXNjYWxscyA6KQ0K
PiANCj4gV291bGQgeW91IGFsc28gc3RvcCBhZGRpbmcgc3lzdGVtIGNhbGxzIHRvIG5hdGl2ZSAz
Mi1iaXQgc3lzdGVtcyB0aGVuPw0KPiANCj4gT24gbWVtb3J5IGNvbnN0cmFpbmVkIHN5c3RlbXMg
KGxlc3MgdGhhbiAyR0IgYS50Lm0uKSwgdGhlcmUgaXMgc3RpbGwgYQ0KPiBzdHJvbmcgZGVtYW5k
IGZvciBydW5uaW5nIDMyLWJpdCB1c2VyIHNwYWNlLCBidXQgYWxsIG9mIHRoZSByZWNlbnQgQXJt
DQo+IGNvcmVzIChhZnRlciBDb3J0ZXgtQTU1KSBkcm9wcGVkIHRoZSBhYmlsaXR5IHRvIHJ1biAz
Mi1iaXQga2VybmVscywgc28NCj4gdGhhdCBjb21wYXQgbW9kZSBtYXkgZXZlbnR1YWxseSBiZWNv
bWUgdGhlIHByaW1hcnkgd2F5IHRvIHJ1bg0KPiBMaW51eCBvbiBjaGVhcCBlbWJlZGRlZCBzeXN0
ZW1zLg0KPiANCj4gSSBkb24ndCB0aGluayB0aGVyZSBpcyBhbnkgY2hhbmNlIHdlIGNhbiByZWFs
aXN0aWNhbGx5IHRha2UgYXdheSBpb191cmluZw0KPiBmcm9tIHRoZSAzMi1iaXQgQUJJIGFueSBt
b3JlIG5vdy4NCg0KQ2FuJ3QgaXQganVzdCBydW4gcmVxdWVzdHMgZnJvbSAzMmJpdCBhcHBzIGlu
IGEga2VybmVsIHRocmVhZCB0aGF0IGhhcw0KdGhlICdpbl9jb21wYXRfc3lzY2FsbCcgZmxhZyBz
ZXQ/DQpOb3QgdGhhdCBpIHJlY2FsbCBzZWVpbmcgdGhlIGNvZGUgd2hlcmUgaXQgc2F2ZXMgdGhl
ICdjb21wYXQnIG5hdHVyZQ0Kb2YgYW55IHJlcXVlc3RzLg0KDQpJdCBpcyBhbHJlYWR5IGNvbXBs
ZXRlbHkgZipja2VkIGlmIHlvdSB0cnkgdG8gcGFzcyB0aGUgY29tbWFuZCByaW5nDQp0byBhIGNo
aWxkIHByb2Nlc3MgLSBpdCB1c2VzIHRoZSB3cm9uZyAnbW0nLg0KSSBzdXNwZWN0IHRoZXJlIGFy
ZSBzb21lIHJlYWxseSBob3JyaWQgc2VjdXJpdHkgaG9sZXMgaW4gdGhhdCBhcmVhLg0KDQoJRGF2
aWQuDQoNCi0NClJlZ2lzdGVyZWQgQWRkcmVzcyBMYWtlc2lkZSwgQnJhbWxleSBSb2FkLCBNb3Vu
dCBGYXJtLCBNaWx0b24gS2V5bmVzLCBNSzEgMVBULCBVSw0KUmVnaXN0cmF0aW9uIE5vOiAxMzk3
Mzg2IChXYWxlcykNCg=

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20 19:22           ` Matthew Wilcox
  2020-09-20 19:28             ` Andy Lutomirski
@ 2020-09-20 21:42             ` Al Viro
  2020-09-21 16:26             ` Pavel Begunkov
  2 siblings, 0 replies; 171+ messages in thread
From: Al Viro @ 2020-09-20 21:42 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, x86, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Sun, Sep 20, 2020 at 08:22:59PM +0100, Matthew Wilcox wrote:
> On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote:
> > IMO it's much saner to mark those and refuse to touch them from io_uring...
> 
> Simpler solution is to remove io_uring from the 32-bit syscall list.
> If you're a 32-bit process, you don't get to use io_uring.  Would
> any real users actually care about that?

What for?  I mean, is there any reason to try and keep those bugs as
first-class citizens?  IDGI...  Yes, we have several special files
(out of thousands) that have read()/write() user-visible semantics
broken wrt 32bit/64bit.  And we have to keep them working that way
for existing syscalls.  Why would we want to pretend that their
behaviour is normal and isn't an ABI bug, not to be repeated for
anything new?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20 19:14       ` Andy Lutomirski
@ 2020-09-21  4:28         ` Christoph Hellwig
  0 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-21  4:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Al Viro, Matthew Wilcox, Christoph Hellwig, Andrew Morton,
	Jens Axboe, Arnd Bergmann, David Howells, linux-arm-kernel,
	X86 ML, LKML, open list:MIPS, Parisc List, linuxppc-dev,
	linux-s390, sparclinux, linux-block, Linux SCSI List,
	Linux FS Devel, linux-aio, io-uring, linux-arch, Linux-MM,
	Network Development, keyrings, LSM List

On Sun, Sep 20, 2020 at 12:14:49PM -0700, Andy Lutomirski wrote:
> I wonder if this is really quite cast in stone.  We could also have
> FMODE_SHITTY_COMPAT and set that when a file like this is *opened* in
> compat mode.  Then that particular struct file would be read and
> written using the compat data format.  The change would be
> user-visible, but the user that would see it would be very strange
> indeed.
> 
> I don't have a strong opinion as to whether that is better or worse
> than denying io_uring access to these things, but at least it moves
> the special case out of io_uring.

open could have happened through an io_uring thread a well, so I don't
see how this would do anything but move the problem to a different
place.

> 
> --Andy
---end quoted text---

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: let import_iovec deal with compat_iovecs as well
  2020-09-19 14:24 ` let import_iovec deal with compat_iovecs as well David Laight
@ 2020-09-21  4:41   ` 'Christoph Hellwig'
  2020-09-21 11:11     ` David Laight
  0 siblings, 1 reply; 171+ messages in thread
From: 'Christoph Hellwig' @ 2020-09-21  4:41 UTC (permalink / raw)
  To: David Laight
  Cc: 'Christoph Hellwig',
	Alexander Viro, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, x86, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Sat, Sep 19, 2020 at 02:24:10PM +0000, David Laight wrote:
> I thought about that change while writing my import_iovec() => iovec_import()
> patch - and thought that the io_uring code would (as usual) cause grief.
> 
> Christoph - did you see those patches?

No.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: let import_iovec deal with compat_iovecs as well
  2020-09-21  4:41   ` 'Christoph Hellwig'
@ 2020-09-21 11:11     ` David Laight
  0 siblings, 0 replies; 171+ messages in thread
From: David Laight @ 2020-09-21 11:11 UTC (permalink / raw)
  To: 'Christoph Hellwig'
  Cc: Alexander Viro, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, x86, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

> On Sat, Sep 19, 2020 at 02:24:10PM +0000, David Laight wrote:
> > I thought about that change while writing my import_iovec() => iovec_import()
> > patch - and thought that the io_uring code would (as usual) cause grief.
> >
> > Christoph - did you see those patches?

Link to cover email.

https://lkml.org/lkml/2020/9/15/661

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* let import_iovec deal with compat_iovecs as well v2
  2020-09-18 12:45 let import_iovec deal with compat_iovecs as well Christoph Hellwig
                   ` (9 preceding siblings ...)
  2020-09-19 14:24 ` let import_iovec deal with compat_iovecs as well David Laight
@ 2020-09-21 14:34 ` Christoph Hellwig
  2020-09-21 14:34   ` [PATCH 01/11] compat.h: fix a spelling error in <linux/compat.h> Christoph Hellwig
                     ` (10 more replies)
  2020-09-25  4:51 ` let import_iovec deal with compat_iovecs as well v4 Christoph Hellwig
  11 siblings, 11 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-21 14:34 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Hi Al,

this series changes import_iovec to transparently deal with comat iovec
structures, and then cleanups up a lot of code dupliation.

Changes since v1:
 - improve a commit message
 - drop a pointless unlikely
 - drop the PF_FORCE_COMPAT flag
 - add a few more cleanups (including two from David Laight)

Diffstat:
 arch/arm64/include/asm/unistd32.h                  |   10 
 arch/mips/kernel/syscalls/syscall_n32.tbl          |   10 
 arch/mips/kernel/syscalls/syscall_o32.tbl          |   10 
 arch/parisc/kernel/syscalls/syscall.tbl            |   10 
 arch/powerpc/kernel/syscalls/syscall.tbl           |   10 
 arch/s390/kernel/syscalls/syscall.tbl              |   10 
 arch/sparc/kernel/syscalls/syscall.tbl             |   10 
 arch/x86/entry/syscall_x32.c                       |    5 
 arch/x86/entry/syscalls/syscall_32.tbl             |   10 
 arch/x86/entry/syscalls/syscall_64.tbl             |   10 
 block/scsi_ioctl.c                                 |   12 
 drivers/scsi/sg.c                                  |    9 
 fs/aio.c                                           |   38 --
 fs/io_uring.c                                      |   20 -
 fs/read_write.c                                    |  362 +--------------------
 fs/splice.c                                        |   57 ---
 include/linux/compat.h                             |   24 -
 include/linux/fs.h                                 |   11 
 include/linux/uio.h                                |   10 
 include/uapi/asm-generic/unistd.h                  |   12 
 lib/iov_iter.c                                     |  161 +++++++--
 mm/process_vm_access.c                             |   85 ----
 net/compat.c                                       |    4 
 security/keys/compat.c                             |   37 --
 security/keys/internal.h                           |    5 
 security/keys/keyctl.c                             |    2 
 tools/include/uapi/asm-generic/unistd.h            |   12 
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl |   10 
 tools/perf/arch/s390/entry/syscalls/syscall.tbl    |   10 
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  |   10 
 30 files changed, 280 insertions(+), 706 deletions(-)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 01/11] compat.h: fix a spelling error in <linux/compat.h>
  2020-09-21 14:34 ` let import_iovec deal with compat_iovecs as well v2 Christoph Hellwig
@ 2020-09-21 14:34   ` Christoph Hellwig
  2020-09-21 14:34   ` [PATCH 02/11] mm: call import_iovec() instead of rw_copy_check_uvector() in process_vm_rw() Christoph Hellwig
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-21 14:34 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

There is no compat_sys_readv64v2 syscall, only a compat_sys_preadv64v2
one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/compat.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index b354ce58966e2d..654c1ec36671a4 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -812,7 +812,7 @@ asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
 		const struct compat_iovec __user *vec,
 		compat_ulong_t vlen, u32 pos_low, u32 pos_high, rwf_t flags);
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
-asmlinkage long  compat_sys_readv64v2(unsigned long fd,
+asmlinkage long  compat_sys_preadv64v2(unsigned long fd,
 		const struct compat_iovec __user *vec,
 		unsigned long vlen, loff_t pos, rwf_t flags);
 #endif
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 02/11] mm: call import_iovec() instead of rw_copy_check_uvector() in process_vm_rw()
  2020-09-21 14:34 ` let import_iovec deal with compat_iovecs as well v2 Christoph Hellwig
  2020-09-21 14:34   ` [PATCH 01/11] compat.h: fix a spelling error in <linux/compat.h> Christoph Hellwig
@ 2020-09-21 14:34   ` Christoph Hellwig
  2020-09-21 14:48     ` Matthew Wilcox
  2020-09-21 15:02     ` Al Viro
  2020-09-21 14:34   ` [PATCH 03/11] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c and mark it static Christoph Hellwig
                     ` (8 subsequent siblings)
  10 siblings, 2 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-21 14:34 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module, David Laight,
	David Laight

From: David Laight <David.Laight@ACULAB.COM>

This is the only direct call of rw_copy_check_uvector().  Removing it
will allow rw_copy_check_uvector() to be inlined into import_iovec(),
while only paying a minor price by setting up an otherwise unused
iov_iter in the process_vm_readv/process_vm_writev syscalls that aren't
in a super hot path.

Signed-off-by: David Laight <david.laight@aculab.com>
[hch: expanded the commit log, pass CHECK_IOVEC_ONLY instead of 0 for the
      compat case, handle CHECK_IOVEC_ONLY in iov_iter_init]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 lib/iov_iter.c         |  2 +-
 mm/process_vm_access.c | 33 ++++++++++++++++++---------------
 2 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 5e40786c8f1232..db54588406dfae 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -443,7 +443,7 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
 			const struct iovec *iov, unsigned long nr_segs,
 			size_t count)
 {
-	WARN_ON(direction & ~(READ | WRITE));
+	WARN_ON(direction & ~(READ | WRITE | CHECK_IOVEC_ONLY));
 	direction &= READ | WRITE;
 
 	/* It will get better.  Eventually... */
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 29c052099affdc..40cd502c337534 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -264,7 +264,7 @@ static ssize_t process_vm_rw(pid_t pid,
 	struct iovec iovstack_r[UIO_FASTIOV];
 	struct iovec *iov_l = iovstack_l;
 	struct iovec *iov_r = iovstack_r;
-	struct iov_iter iter;
+	struct iov_iter iter_l, iter_r;
 	ssize_t rc;
 	int dir = vm_write ? WRITE : READ;
 
@@ -272,23 +272,25 @@ static ssize_t process_vm_rw(pid_t pid,
 		return -EINVAL;
 
 	/* Check iovecs */
-	rc = import_iovec(dir, lvec, liovcnt, UIO_FASTIOV, &iov_l, &iter);
+	rc = import_iovec(dir, lvec, liovcnt, UIO_FASTIOV, &iov_l, &iter_l);
 	if (rc < 0)
 		return rc;
-	if (!iov_iter_count(&iter))
+	if (!iov_iter_count(&iter_l))
 		goto free_iovecs;
 
-	rc = rw_copy_check_uvector(CHECK_IOVEC_ONLY, rvec, riovcnt, UIO_FASTIOV,
-				   iovstack_r, &iov_r);
+	rc = import_iovec(CHECK_IOVEC_ONLY, rvec, riovcnt, UIO_FASTIOV, &iov_r,
+			  &iter_r);
 	if (rc <= 0)
 		goto free_iovecs;
 
-	rc = process_vm_rw_core(pid, &iter, iov_r, riovcnt, flags, vm_write);
+	rc = process_vm_rw_core(pid, &iter_l, iter_r.iov, iter_r.nr_segs,
+				flags, vm_write);
 
 free_iovecs:
 	if (iov_r != iovstack_r)
 		kfree(iov_r);
-	kfree(iov_l);
+	if (iov_l != iovstack_l)
+		kfree(iov_l);
 
 	return rc;
 }
@@ -322,30 +324,31 @@ compat_process_vm_rw(compat_pid_t pid,
 	struct iovec iovstack_r[UIO_FASTIOV];
 	struct iovec *iov_l = iovstack_l;
 	struct iovec *iov_r = iovstack_r;
-	struct iov_iter iter;
+	struct iov_iter iter_l, iter_r;
 	ssize_t rc = -EFAULT;
 	int dir = vm_write ? WRITE : READ;
 
 	if (flags != 0)
 		return -EINVAL;
 
-	rc = compat_import_iovec(dir, lvec, liovcnt, UIO_FASTIOV, &iov_l, &iter);
+	rc = compat_import_iovec(dir, lvec, liovcnt, UIO_FASTIOV, &iov_l, &iter_l);
 	if (rc < 0)
 		return rc;
-	if (!iov_iter_count(&iter))
+	if (!iov_iter_count(&iter_l))
 		goto free_iovecs;
-	rc = compat_rw_copy_check_uvector(CHECK_IOVEC_ONLY, rvec, riovcnt,
-					  UIO_FASTIOV, iovstack_r,
-					  &iov_r);
+	rc = compat_import_iovec(CHECK_IOVEC_ONLY, rvec, riovcnt, UIO_FASTIOV,
+				 &iov_r, &iter_r);
 	if (rc <= 0)
 		goto free_iovecs;
 
-	rc = process_vm_rw_core(pid, &iter, iov_r, riovcnt, flags, vm_write);
+	rc = process_vm_rw_core(pid, &iter_l, iter_r.iov, iter_r.nr_segs,
+				flags, vm_write);
 
 free_iovecs:
 	if (iov_r != iovstack_r)
 		kfree(iov_r);
-	kfree(iov_l);
+	if (iov_l != iovstack_l)
+		kfree(iov_l);
 	return rc;
 }
 
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 03/11] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c and mark it static
  2020-09-21 14:34 ` let import_iovec deal with compat_iovecs as well v2 Christoph Hellwig
  2020-09-21 14:34   ` [PATCH 01/11] compat.h: fix a spelling error in <linux/compat.h> Christoph Hellwig
  2020-09-21 14:34   ` [PATCH 02/11] mm: call import_iovec() instead of rw_copy_check_uvector() in process_vm_rw() Christoph Hellwig
@ 2020-09-21 14:34   ` Christoph Hellwig
  2020-09-21 14:34   ` [PATCH 04/11] iov_iter: explicitly check for CHECK_IOVEC_ONLY in rw_copy_check_uvector Christoph Hellwig
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-21 14:34 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module, David Laight,
	David Laight

From: David Laight <David.Laight@ACULAB.COM>

This lets the compiler inline it into import_iovec() generating
much better code.

Signed-off-by: David Laight <david.laight@aculab.com>
[hch: drop the now pointless kerneldoc for a static function, and update
      a few other comments]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c        | 179 -----------------------------------------
 include/linux/compat.h |   6 --
 include/linux/fs.h     |  11 +--
 lib/iov_iter.c         | 150 +++++++++++++++++++++++++++++++++-
 4 files changed, 151 insertions(+), 195 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 5db58b8c78d0dd..e5e891a88442ef 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -752,185 +752,6 @@ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
 	return ret;
 }
 
-/**
- * rw_copy_check_uvector() - Copy an array of &struct iovec from userspace
- *     into the kernel and check that it is valid.
- *
- * @type: One of %CHECK_IOVEC_ONLY, %READ, or %WRITE.
- * @uvector: Pointer to the userspace array.
- * @nr_segs: Number of elements in userspace array.
- * @fast_segs: Number of elements in @fast_pointer.
- * @fast_pointer: Pointer to (usually small on-stack) kernel array.
- * @ret_pointer: (output parameter) Pointer to a variable that will point to
- *     either @fast_pointer, a newly allocated kernel array, or NULL,
- *     depending on which array was used.
- *
- * This function copies an array of &struct iovec of @nr_segs from
- * userspace into the kernel and checks that each element is valid (e.g.
- * it does not point to a kernel address or cause overflow by being too
- * large, etc.).
- *
- * As an optimization, the caller may provide a pointer to a small
- * on-stack array in @fast_pointer, typically %UIO_FASTIOV elements long
- * (the size of this array, or 0 if unused, should be given in @fast_segs).
- *
- * @ret_pointer will always point to the array that was used, so the
- * caller must take care not to call kfree() on it e.g. in case the
- * @fast_pointer array was used and it was allocated on the stack.
- *
- * Return: The total number of bytes covered by the iovec array on success
- *   or a negative error code on error.
- */
-ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
-			      unsigned long nr_segs, unsigned long fast_segs,
-			      struct iovec *fast_pointer,
-			      struct iovec **ret_pointer)
-{
-	unsigned long seg;
-	ssize_t ret;
-	struct iovec *iov = fast_pointer;
-
-	/*
-	 * SuS says "The readv() function *may* fail if the iovcnt argument
-	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-	 * traditionally returned zero for zero segments, so...
-	 */
-	if (nr_segs = 0) {
-		ret = 0;
-		goto out;
-	}
-
-	/*
-	 * First get the "struct iovec" from user memory and
-	 * verify all the pointers
-	 */
-	if (nr_segs > UIO_MAXIOV) {
-		ret = -EINVAL;
-		goto out;
-	}
-	if (nr_segs > fast_segs) {
-		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-		if (iov = NULL) {
-			ret = -ENOMEM;
-			goto out;
-		}
-	}
-	if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
-		ret = -EFAULT;
-		goto out;
-	}
-
-	/*
-	 * According to the Single Unix Specification we should return EINVAL
-	 * if an element length is < 0 when cast to ssize_t or if the
-	 * total length would overflow the ssize_t return value of the
-	 * system call.
-	 *
-	 * Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
-	 * overflow case.
-	 */
-	ret = 0;
-	for (seg = 0; seg < nr_segs; seg++) {
-		void __user *buf = iov[seg].iov_base;
-		ssize_t len = (ssize_t)iov[seg].iov_len;
-
-		/* see if we we're about to use an invalid len or if
-		 * it's about to overflow ssize_t */
-		if (len < 0) {
-			ret = -EINVAL;
-			goto out;
-		}
-		if (type >= 0
-		    && unlikely(!access_ok(buf, len))) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len > MAX_RW_COUNT - ret) {
-			len = MAX_RW_COUNT - ret;
-			iov[seg].iov_len = len;
-		}
-		ret += len;
-	}
-out:
-	*ret_pointer = iov;
-	return ret;
-}
-
-#ifdef CONFIG_COMPAT
-ssize_t compat_rw_copy_check_uvector(int type,
-		const struct compat_iovec __user *uvector, unsigned long nr_segs,
-		unsigned long fast_segs, struct iovec *fast_pointer,
-		struct iovec **ret_pointer)
-{
-	compat_ssize_t tot_len;
-	struct iovec *iov = *ret_pointer = fast_pointer;
-	ssize_t ret = 0;
-	int seg;
-
-	/*
-	 * SuS says "The readv() function *may* fail if the iovcnt argument
-	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-	 * traditionally returned zero for zero segments, so...
-	 */
-	if (nr_segs = 0)
-		goto out;
-
-	ret = -EINVAL;
-	if (nr_segs > UIO_MAXIOV)
-		goto out;
-	if (nr_segs > fast_segs) {
-		ret = -ENOMEM;
-		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-		if (iov = NULL)
-			goto out;
-	}
-	*ret_pointer = iov;
-
-	ret = -EFAULT;
-	if (!access_ok(uvector, nr_segs*sizeof(*uvector)))
-		goto out;
-
-	/*
-	 * Single unix specification:
-	 * We should -EINVAL if an element length is not >= 0 and fitting an
-	 * ssize_t.
-	 *
-	 * In Linux, the total length is limited to MAX_RW_COUNT, there is
-	 * no overflow possibility.
-	 */
-	tot_len = 0;
-	ret = -EINVAL;
-	for (seg = 0; seg < nr_segs; seg++) {
-		compat_uptr_t buf;
-		compat_ssize_t len;
-
-		if (__get_user(len, &uvector->iov_len) ||
-		   __get_user(buf, &uvector->iov_base)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len < 0)	/* size_t not fitting in compat_ssize_t .. */
-			goto out;
-		if (type >= 0 &&
-		    !access_ok(compat_ptr(buf), len)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len > MAX_RW_COUNT - tot_len)
-			len = MAX_RW_COUNT - tot_len;
-		tot_len += len;
-		iov->iov_base = compat_ptr(buf);
-		iov->iov_len = (compat_size_t) len;
-		uvector++;
-		iov++;
-	}
-	ret = tot_len;
-
-out:
-	return ret;
-}
-#endif
-
 static ssize_t do_iter_read(struct file *file, struct iov_iter *iter,
 		loff_t *pos, rwf_t flags)
 {
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 654c1ec36671a4..b930de791ff16b 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -451,12 +451,6 @@ extern long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
 
 struct epoll_event;	/* fortunately, this one is fixed-layout */
 
-extern ssize_t compat_rw_copy_check_uvector(int type,
-		const struct compat_iovec __user *uvector,
-		unsigned long nr_segs,
-		unsigned long fast_segs, struct iovec *fast_pointer,
-		struct iovec **ret_pointer);
-
 extern void __user *compat_alloc_user_space(unsigned long len);
 
 int compat_restore_altstack(const compat_stack_t __user *uss);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7519ae003a082c..5420822104a99d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -179,10 +179,8 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 #define FMODE_BUF_RASYNC	((__force fmode_t)0x40000000)
 
 /*
- * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
- * that indicates that they should check the contents of the iovec are
- * valid, but not check the memory that the iovec elements
- * points too.
+ * Flag for import_iovec that indicates that it should check the contents of the
+ * iovec is valid, but not check the memory that the iovec elements points too.
  */
 #define CHECK_IOVEC_ONLY -1
 
@@ -1887,11 +1885,6 @@ static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
 	return file->f_op->mmap(file, vma);
 }
 
-ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
-			      unsigned long nr_segs, unsigned long fast_segs,
-			      struct iovec *fast_pointer,
-			      struct iovec **ret_pointer);
-
 extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
 extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index db54588406dfae..d7e72343c360eb 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1650,12 +1650,87 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
 }
 EXPORT_SYMBOL(dup_iter);
 
+static ssize_t rw_copy_check_uvector(int type,
+		const struct iovec __user *uvector, unsigned long nr_segs,
+		unsigned long fast_segs, struct iovec *fast_pointer,
+		struct iovec **ret_pointer)
+{
+	unsigned long seg;
+	ssize_t ret;
+	struct iovec *iov = fast_pointer;
+
+	/*
+	 * SuS says "The readv() function *may* fail if the iovcnt argument
+	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
+	 * traditionally returned zero for zero segments, so...
+	 */
+	if (nr_segs = 0) {
+		ret = 0;
+		goto out;
+	}
+
+	/*
+	 * First get the "struct iovec" from user memory and
+	 * verify all the pointers
+	 */
+	if (nr_segs > UIO_MAXIOV) {
+		ret = -EINVAL;
+		goto out;
+	}
+	if (nr_segs > fast_segs) {
+		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
+		if (iov = NULL) {
+			ret = -ENOMEM;
+			goto out;
+		}
+	}
+	if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	/*
+	 * According to the Single Unix Specification we should return EINVAL
+	 * if an element length is < 0 when cast to ssize_t or if the
+	 * total length would overflow the ssize_t return value of the
+	 * system call.
+	 *
+	 * Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
+	 * overflow case.
+	 */
+	ret = 0;
+	for (seg = 0; seg < nr_segs; seg++) {
+		void __user *buf = iov[seg].iov_base;
+		ssize_t len = (ssize_t)iov[seg].iov_len;
+
+		/* see if we we're about to use an invalid len or if
+		 * it's about to overflow ssize_t */
+		if (len < 0) {
+			ret = -EINVAL;
+			goto out;
+		}
+		if (type >= 0
+		    && unlikely(!access_ok(buf, len))) {
+			ret = -EFAULT;
+			goto out;
+		}
+		if (len > MAX_RW_COUNT - ret) {
+			len = MAX_RW_COUNT - ret;
+			iov[seg].iov_len = len;
+		}
+		ret += len;
+	}
+out:
+	*ret_pointer = iov;
+	return ret;
+}
+
 /**
  * import_iovec() - Copy an array of &struct iovec from userspace
  *     into the kernel, check that it is valid, and initialize a new
  *     &struct iov_iter iterator to access it.
  *
- * @type: One of %READ or %WRITE.
+ * @type: One of %CHECK_IOVEC_ONLY, %READ, or %WRITE.
  * @uvector: Pointer to the userspace array.
  * @nr_segs: Number of elements in userspace array.
  * @fast_segs: Number of elements in @iov.
@@ -1695,6 +1770,79 @@ EXPORT_SYMBOL(import_iovec);
 #ifdef CONFIG_COMPAT
 #include <linux/compat.h>
 
+static ssize_t compat_rw_copy_check_uvector(int type,
+		const struct compat_iovec __user *uvector, unsigned long nr_segs,
+		unsigned long fast_segs, struct iovec *fast_pointer,
+		struct iovec **ret_pointer)
+{
+	compat_ssize_t tot_len;
+	struct iovec *iov = *ret_pointer = fast_pointer;
+	ssize_t ret = 0;
+	int seg;
+
+	/*
+	 * SuS says "The readv() function *may* fail if the iovcnt argument
+	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
+	 * traditionally returned zero for zero segments, so...
+	 */
+	if (nr_segs = 0)
+		goto out;
+
+	ret = -EINVAL;
+	if (nr_segs > UIO_MAXIOV)
+		goto out;
+	if (nr_segs > fast_segs) {
+		ret = -ENOMEM;
+		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
+		if (iov = NULL)
+			goto out;
+	}
+	*ret_pointer = iov;
+
+	ret = -EFAULT;
+	if (!access_ok(uvector, nr_segs*sizeof(*uvector)))
+		goto out;
+
+	/*
+	 * Single unix specification:
+	 * We should -EINVAL if an element length is not >= 0 and fitting an
+	 * ssize_t.
+	 *
+	 * In Linux, the total length is limited to MAX_RW_COUNT, there is
+	 * no overflow possibility.
+	 */
+	tot_len = 0;
+	ret = -EINVAL;
+	for (seg = 0; seg < nr_segs; seg++) {
+		compat_uptr_t buf;
+		compat_ssize_t len;
+
+		if (__get_user(len, &uvector->iov_len) ||
+		   __get_user(buf, &uvector->iov_base)) {
+			ret = -EFAULT;
+			goto out;
+		}
+		if (len < 0)	/* size_t not fitting in compat_ssize_t .. */
+			goto out;
+		if (type >= 0 &&
+		    !access_ok(compat_ptr(buf), len)) {
+			ret = -EFAULT;
+			goto out;
+		}
+		if (len > MAX_RW_COUNT - tot_len)
+			len = MAX_RW_COUNT - tot_len;
+		tot_len += len;
+		iov->iov_base = compat_ptr(buf);
+		iov->iov_len = (compat_size_t) len;
+		uvector++;
+		iov++;
+	}
+	ret = tot_len;
+
+out:
+	return ret;
+}
+
 ssize_t compat_import_iovec(int type,
 		const struct compat_iovec __user * uvector,
 		unsigned nr_segs, unsigned fast_segs,
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 04/11] iov_iter: explicitly check for CHECK_IOVEC_ONLY in rw_copy_check_uvector
  2020-09-21 14:34 ` let import_iovec deal with compat_iovecs as well v2 Christoph Hellwig
                     ` (2 preceding siblings ...)
  2020-09-21 14:34   ` [PATCH 03/11] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c and mark it static Christoph Hellwig
@ 2020-09-21 14:34   ` Christoph Hellwig
  2020-09-21 15:05     ` David Laight
  2020-09-21 15:07     ` Al Viro
  2020-09-21 14:34   ` [PATCH 05/11] iov_iter: merge the compat case into rw_copy_check_uvector Christoph Hellwig
                     ` (6 subsequent siblings)
  10 siblings, 2 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-21 14:34 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Explicitly check for the magic value insted of implicitly relying on
its numeric representation.   Also drop the rather pointless unlikely
annotation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 lib/iov_iter.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index d7e72343c360eb..a64867501a7483 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1709,8 +1709,7 @@ static ssize_t rw_copy_check_uvector(int type,
 			ret = -EINVAL;
 			goto out;
 		}
-		if (type >= 0
-		    && unlikely(!access_ok(buf, len))) {
+		if (type != CHECK_IOVEC_ONLY && !access_ok(buf, len)) {
 			ret = -EFAULT;
 			goto out;
 		}
@@ -1824,7 +1823,7 @@ static ssize_t compat_rw_copy_check_uvector(int type,
 		}
 		if (len < 0)	/* size_t not fitting in compat_ssize_t .. */
 			goto out;
-		if (type >= 0 &&
+		if (type != CHECK_IOVEC_ONLY &&
 		    !access_ok(compat_ptr(buf), len)) {
 			ret = -EFAULT;
 			goto out;
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 05/11] iov_iter: merge the compat case into rw_copy_check_uvector
  2020-09-21 14:34 ` let import_iovec deal with compat_iovecs as well v2 Christoph Hellwig
                     ` (3 preceding siblings ...)
  2020-09-21 14:34   ` [PATCH 04/11] iov_iter: explicitly check for CHECK_IOVEC_ONLY in rw_copy_check_uvector Christoph Hellwig
@ 2020-09-21 14:34   ` Christoph Hellwig
  2020-09-21 15:14     ` Al Viro
  2021-01-08 11:49     ` David Laight
  2020-09-21 14:34   ` [PATCH 06/11] iov_iter: handle the compat case in import_iovec Christoph Hellwig
                     ` (5 subsequent siblings)
  10 siblings, 2 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-21 14:34 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Stop duplicating the iovec verify code, and instead add add a
__import_iovec helper that does the whole verify and import, but takes
a bool compat to decided on the native or compat layout.  This also
ends up massively simplifying the calling conventions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 lib/iov_iter.c | 195 ++++++++++++++++++-------------------------------
 1 file changed, 70 insertions(+), 125 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index a64867501a7483..8bfa47b63d39aa 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -10,6 +10,7 @@
 #include <net/checksum.h>
 #include <linux/scatterlist.h>
 #include <linux/instrumented.h>
+#include <linux/compat.h>
 
 #define PIPE_PARANOIA /* for now */
 
@@ -1650,43 +1651,76 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
 }
 EXPORT_SYMBOL(dup_iter);
 
-static ssize_t rw_copy_check_uvector(int type,
-		const struct iovec __user *uvector, unsigned long nr_segs,
-		unsigned long fast_segs, struct iovec *fast_pointer,
-		struct iovec **ret_pointer)
+static int compat_copy_iovecs_from_user(struct iovec *iov,
+		const struct iovec __user *uvector, unsigned long nr_segs)
+{
+	const struct compat_iovec __user *uiov +		(const struct compat_iovec __user *)uvector;
+	unsigned long i;
+	int ret = -EFAULT;
+
+	if (!user_access_begin(uvector, nr_segs * sizeof(*uvector)))
+		return -EFAULT;
+
+	for (i = 0; i < nr_segs; i++) {
+		compat_uptr_t buf;
+		compat_ssize_t len;
+
+		unsafe_get_user(len, &uiov[i].iov_len, out);
+		unsafe_get_user(buf, &uiov[i].iov_base, out);
+
+		/* check for compat_size_t not fitting in compat_ssize_t .. */
+		if (len < 0) {
+			ret = -EINVAL;
+			goto out;
+		}
+		iov[i].iov_base = compat_ptr(buf);
+		iov[i].iov_len = len;
+	}
+	ret = 0;
+out:
+	user_access_end();
+	return ret;
+}
+
+static ssize_t __import_iovec(int type, const struct iovec __user *uvector,
+		unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
+		struct iov_iter *i, bool compat)
 {
+	struct iovec *iov = *iovp;
 	unsigned long seg;
-	ssize_t ret;
-	struct iovec *iov = fast_pointer;
+	ssize_t ret = 0;
 
 	/*
 	 * SuS says "The readv() function *may* fail if the iovcnt argument
 	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
 	 * traditionally returned zero for zero segments, so...
 	 */
-	if (nr_segs = 0) {
-		ret = 0;
-		goto out;
-	}
+	if (nr_segs = 0)
+		goto done;
 
 	/*
 	 * First get the "struct iovec" from user memory and
 	 * verify all the pointers
 	 */
-	if (nr_segs > UIO_MAXIOV) {
-		ret = -EINVAL;
-		goto out;
-	}
+	ret = -EINVAL;
+	if (nr_segs > UIO_MAXIOV)
+		goto fail;
 	if (nr_segs > fast_segs) {
+		ret = -ENOMEM;
 		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-		if (iov = NULL) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		if (!iov)
+			goto fail;
 	}
-	if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
+
+	if (compat) {
+		ret = compat_copy_iovecs_from_user(iov, uvector, nr_segs);
+		if (ret)
+			goto fail;
+	} else {
 		ret = -EFAULT;
-		goto out;
+		if (copy_from_user(iov, uvector, nr_segs * sizeof(*uvector)))
+			goto fail;
 	}
 
 	/*
@@ -1707,11 +1741,11 @@ static ssize_t rw_copy_check_uvector(int type,
 		 * it's about to overflow ssize_t */
 		if (len < 0) {
 			ret = -EINVAL;
-			goto out;
+			goto fail;
 		}
 		if (type != CHECK_IOVEC_ONLY && !access_ok(buf, len)) {
 			ret = -EFAULT;
-			goto out;
+			goto fail;
 		}
 		if (len > MAX_RW_COUNT - ret) {
 			len = MAX_RW_COUNT - ret;
@@ -1719,8 +1753,17 @@ static ssize_t rw_copy_check_uvector(int type,
 		}
 		ret += len;
 	}
-out:
-	*ret_pointer = iov;
+done:
+	iov_iter_init(i, type, iov, nr_segs, ret);
+	if (iov = *iovp)
+		*iovp = NULL;
+	else
+		*iovp = iov;
+	return ret;
+fail:
+	if (iov != *iovp)
+		kfree(iov);
+	*iovp = NULL;
 	return ret;
 }
 
@@ -1750,116 +1793,18 @@ ssize_t import_iovec(int type, const struct iovec __user * uvector,
 		 unsigned nr_segs, unsigned fast_segs,
 		 struct iovec **iov, struct iov_iter *i)
 {
-	ssize_t n;
-	struct iovec *p;
-	n = rw_copy_check_uvector(type, uvector, nr_segs, fast_segs,
-				  *iov, &p);
-	if (n < 0) {
-		if (p != *iov)
-			kfree(p);
-		*iov = NULL;
-		return n;
-	}
-	iov_iter_init(i, type, p, nr_segs, n);
-	*iov = p = *iov ? NULL : p;
-	return n;
+	return __import_iovec(type, uvector, nr_segs, fast_segs, iov, i, false);
 }
 EXPORT_SYMBOL(import_iovec);
 
 #ifdef CONFIG_COMPAT
-#include <linux/compat.h>
-
-static ssize_t compat_rw_copy_check_uvector(int type,
-		const struct compat_iovec __user *uvector, unsigned long nr_segs,
-		unsigned long fast_segs, struct iovec *fast_pointer,
-		struct iovec **ret_pointer)
-{
-	compat_ssize_t tot_len;
-	struct iovec *iov = *ret_pointer = fast_pointer;
-	ssize_t ret = 0;
-	int seg;
-
-	/*
-	 * SuS says "The readv() function *may* fail if the iovcnt argument
-	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-	 * traditionally returned zero for zero segments, so...
-	 */
-	if (nr_segs = 0)
-		goto out;
-
-	ret = -EINVAL;
-	if (nr_segs > UIO_MAXIOV)
-		goto out;
-	if (nr_segs > fast_segs) {
-		ret = -ENOMEM;
-		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-		if (iov = NULL)
-			goto out;
-	}
-	*ret_pointer = iov;
-
-	ret = -EFAULT;
-	if (!access_ok(uvector, nr_segs*sizeof(*uvector)))
-		goto out;
-
-	/*
-	 * Single unix specification:
-	 * We should -EINVAL if an element length is not >= 0 and fitting an
-	 * ssize_t.
-	 *
-	 * In Linux, the total length is limited to MAX_RW_COUNT, there is
-	 * no overflow possibility.
-	 */
-	tot_len = 0;
-	ret = -EINVAL;
-	for (seg = 0; seg < nr_segs; seg++) {
-		compat_uptr_t buf;
-		compat_ssize_t len;
-
-		if (__get_user(len, &uvector->iov_len) ||
-		   __get_user(buf, &uvector->iov_base)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len < 0)	/* size_t not fitting in compat_ssize_t .. */
-			goto out;
-		if (type != CHECK_IOVEC_ONLY &&
-		    !access_ok(compat_ptr(buf), len)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len > MAX_RW_COUNT - tot_len)
-			len = MAX_RW_COUNT - tot_len;
-		tot_len += len;
-		iov->iov_base = compat_ptr(buf);
-		iov->iov_len = (compat_size_t) len;
-		uvector++;
-		iov++;
-	}
-	ret = tot_len;
-
-out:
-	return ret;
-}
-
 ssize_t compat_import_iovec(int type,
 		const struct compat_iovec __user * uvector,
 		unsigned nr_segs, unsigned fast_segs,
 		struct iovec **iov, struct iov_iter *i)
 {
-	ssize_t n;
-	struct iovec *p;
-	n = compat_rw_copy_check_uvector(type, uvector, nr_segs, fast_segs,
-				  *iov, &p);
-	if (n < 0) {
-		if (p != *iov)
-			kfree(p);
-		*iov = NULL;
-		return n;
-	}
-	iov_iter_init(i, type, p, nr_segs, n);
-	*iov = p = *iov ? NULL : p;
-	return n;
+	return __import_iovec(type, (const struct iovec __user *)uvector,
+			      nr_segs, fast_segs, iov, i, true);
 }
 EXPORT_SYMBOL(compat_import_iovec);
 #endif
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 06/11] iov_iter: handle the compat case in import_iovec
  2020-09-21 14:34 ` let import_iovec deal with compat_iovecs as well v2 Christoph Hellwig
                     ` (4 preceding siblings ...)
  2020-09-21 14:34   ` [PATCH 05/11] iov_iter: merge the compat case into rw_copy_check_uvector Christoph Hellwig
@ 2020-09-21 14:34   ` Christoph Hellwig
  2020-09-21 15:20     ` Al Viro
  2020-09-21 14:34   ` [PATCH 07/11] fs: remove various compat readv/writev helpers Christoph Hellwig
                     ` (4 subsequent siblings)
  10 siblings, 1 reply; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-21 14:34 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Use in compat_syscall to import either native or the compat iovecs, and
remove the now superflous compat_import_iovec, which removes the need for
special compat logic in most callers.  Only io_uring needs special
treatment given that it can call import_iovec from kernel threads acting
on behalf of native or compat syscalls.  Expose the low-level
__import_iovec helper and use it in io_uring to explicitly pick a iovec
layout.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/scsi_ioctl.c     | 12 ++----------
 drivers/scsi/sg.c      |  9 +--------
 fs/aio.c               | 38 ++++++++++++++------------------------
 fs/io_uring.c          | 20 ++++++++------------
 fs/read_write.c        |  6 ++++--
 fs/splice.c            |  2 +-
 include/linux/uio.h    | 10 +++-------
 lib/iov_iter.c         | 17 +++--------------
 mm/process_vm_access.c |  7 ++++---
 net/compat.c           |  4 ++--
 security/keys/compat.c |  5 ++---
 11 files changed, 44 insertions(+), 86 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index ef722f04f88a93..e08df86866ee5d 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -333,16 +333,8 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
 		struct iov_iter i;
 		struct iovec *iov = NULL;
 
-#ifdef CONFIG_COMPAT
-		if (in_compat_syscall())
-			ret = compat_import_iovec(rq_data_dir(rq),
-				   hdr->dxferp, hdr->iovec_count,
-				   0, &iov, &i);
-		else
-#endif
-			ret = import_iovec(rq_data_dir(rq),
-				   hdr->dxferp, hdr->iovec_count,
-				   0, &iov, &i);
+		ret = import_iovec(rq_data_dir(rq), hdr->dxferp,
+				   hdr->iovec_count, 0, &iov, &i);
 		if (ret < 0)
 			goto out_free_cdb;
 
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 20472aaaf630a4..bfa8d77322d732 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1820,14 +1820,7 @@ sg_start_req(Sg_request *srp, unsigned char *cmd)
 		struct iovec *iov = NULL;
 		struct iov_iter i;
 
-#ifdef CONFIG_COMPAT
-		if (in_compat_syscall())
-			res = compat_import_iovec(rw, hp->dxferp, iov_count,
-						  0, &iov, &i);
-		else
-#endif
-			res = import_iovec(rw, hp->dxferp, iov_count,
-					   0, &iov, &i);
+		res = import_iovec(rw, hp->dxferp, iov_count, 0, &iov, &i);
 		if (res < 0)
 			return res;
 
diff --git a/fs/aio.c b/fs/aio.c
index d5ec303855669d..b377f5c2048e18 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1478,8 +1478,7 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb)
 }
 
 static ssize_t aio_setup_rw(int rw, const struct iocb *iocb,
-		struct iovec **iovec, bool vectored, bool compat,
-		struct iov_iter *iter)
+		struct iovec **iovec, bool vectored, struct iov_iter *iter)
 {
 	void __user *buf = (void __user *)(uintptr_t)iocb->aio_buf;
 	size_t len = iocb->aio_nbytes;
@@ -1489,11 +1488,6 @@ static ssize_t aio_setup_rw(int rw, const struct iocb *iocb,
 		*iovec = NULL;
 		return ret;
 	}
-#ifdef CONFIG_COMPAT
-	if (compat)
-		return compat_import_iovec(rw, buf, len, UIO_FASTIOV, iovec,
-				iter);
-#endif
 	return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter);
 }
 
@@ -1517,8 +1511,7 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret)
 	}
 }
 
-static int aio_read(struct kiocb *req, const struct iocb *iocb,
-			bool vectored, bool compat)
+static int aio_read(struct kiocb *req, const struct iocb *iocb, bool vectored)
 {
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct iov_iter iter;
@@ -1535,7 +1528,7 @@ static int aio_read(struct kiocb *req, const struct iocb *iocb,
 	if (unlikely(!file->f_op->read_iter))
 		return -EINVAL;
 
-	ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter);
+	ret = aio_setup_rw(READ, iocb, &iovec, vectored, &iter);
 	if (ret < 0)
 		return ret;
 	ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter));
@@ -1545,8 +1538,7 @@ static int aio_read(struct kiocb *req, const struct iocb *iocb,
 	return ret;
 }
 
-static int aio_write(struct kiocb *req, const struct iocb *iocb,
-			 bool vectored, bool compat)
+static int aio_write(struct kiocb *req, const struct iocb *iocb, bool vectored)
 {
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct iov_iter iter;
@@ -1563,7 +1555,7 @@ static int aio_write(struct kiocb *req, const struct iocb *iocb,
 	if (unlikely(!file->f_op->write_iter))
 		return -EINVAL;
 
-	ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter);
+	ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, &iter);
 	if (ret < 0)
 		return ret;
 	ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
@@ -1799,8 +1791,7 @@ static int aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 }
 
 static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
-			   struct iocb __user *user_iocb, struct aio_kiocb *req,
-			   bool compat)
+			   struct iocb __user *user_iocb, struct aio_kiocb *req)
 {
 	req->ki_filp = fget(iocb->aio_fildes);
 	if (unlikely(!req->ki_filp))
@@ -1833,13 +1824,13 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 
 	switch (iocb->aio_lio_opcode) {
 	case IOCB_CMD_PREAD:
-		return aio_read(&req->rw, iocb, false, compat);
+		return aio_read(&req->rw, iocb, false);
 	case IOCB_CMD_PWRITE:
-		return aio_write(&req->rw, iocb, false, compat);
+		return aio_write(&req->rw, iocb, false);
 	case IOCB_CMD_PREADV:
-		return aio_read(&req->rw, iocb, true, compat);
+		return aio_read(&req->rw, iocb, true);
 	case IOCB_CMD_PWRITEV:
-		return aio_write(&req->rw, iocb, true, compat);
+		return aio_write(&req->rw, iocb, true);
 	case IOCB_CMD_FSYNC:
 		return aio_fsync(&req->fsync, iocb, false);
 	case IOCB_CMD_FDSYNC:
@@ -1852,8 +1843,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 	}
 }
 
-static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
-			 bool compat)
+static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb)
 {
 	struct aio_kiocb *req;
 	struct iocb iocb;
@@ -1882,7 +1872,7 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	if (unlikely(!req))
 		return -EAGAIN;
 
-	err = __io_submit_one(ctx, &iocb, user_iocb, req, compat);
+	err = __io_submit_one(ctx, &iocb, user_iocb, req);
 
 	/* Done with the synchronous reference */
 	iocb_put(req);
@@ -1941,7 +1931,7 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
 			break;
 		}
 
-		ret = io_submit_one(ctx, user_iocb, false);
+		ret = io_submit_one(ctx, user_iocb);
 		if (ret)
 			break;
 	}
@@ -1983,7 +1973,7 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id,
 			break;
 		}
 
-		ret = io_submit_one(ctx, compat_ptr(user_iocb), true);
+		ret = io_submit_one(ctx, compat_ptr(user_iocb));
 		if (ret)
 			break;
 	}
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3790c7fe9fee22..ba84ecea7cb1a4 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2837,13 +2837,8 @@ static ssize_t __io_import_iovec(int rw, struct io_kiocb *req,
 		return ret;
 	}
 
-#ifdef CONFIG_COMPAT
-	if (req->ctx->compat)
-		return compat_import_iovec(rw, buf, sqe_len, UIO_FASTIOV,
-						iovec, iter);
-#endif
-
-	return import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter);
+	return __import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter,
+			      req->ctx->compat);
 }
 
 static ssize_t io_import_iovec(int rw, struct io_kiocb *req,
@@ -4179,8 +4174,9 @@ static int __io_recvmsg_copy_hdr(struct io_kiocb *req,
 				sr->len);
 		iomsg->iov = NULL;
 	} else {
-		ret = import_iovec(READ, uiov, iov_len, UIO_FASTIOV,
-					&iomsg->iov, &iomsg->msg.msg_iter);
+		ret = __import_iovec(READ, uiov, iov_len, UIO_FASTIOV,
+				     &iomsg->iov, &iomsg->msg.msg_iter,
+				     false);
 		if (ret > 0)
 			ret = 0;
 	}
@@ -4220,9 +4216,9 @@ static int __io_compat_recvmsg_copy_hdr(struct io_kiocb *req,
 		sr->len = iomsg->iov[0].iov_len;
 		iomsg->iov = NULL;
 	} else {
-		ret = compat_import_iovec(READ, uiov, len, UIO_FASTIOV,
-						&iomsg->iov,
-						&iomsg->msg.msg_iter);
+		ret = __import_iovec(READ, (struct iovec __user *)uiov, len,
+				   UIO_FASTIOV, &iomsg->iov,
+				   &iomsg->msg.msg_iter, true);
 		if (ret < 0)
 			return ret;
 	}
diff --git a/fs/read_write.c b/fs/read_write.c
index e5e891a88442ef..0a68037580b455 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1078,7 +1078,8 @@ static size_t compat_readv(struct file *file,
 	struct iov_iter iter;
 	ssize_t ret;
 
-	ret = compat_import_iovec(READ, vec, vlen, UIO_FASTIOV, &iov, &iter);
+	ret = import_iovec(READ, (const struct iovec __user *)vec, vlen,
+			   UIO_FASTIOV, &iov, &iter);
 	if (ret >= 0) {
 		ret = do_iter_read(file, &iter, pos, flags);
 		kfree(iov);
@@ -1186,7 +1187,8 @@ static size_t compat_writev(struct file *file,
 	struct iov_iter iter;
 	ssize_t ret;
 
-	ret = compat_import_iovec(WRITE, vec, vlen, UIO_FASTIOV, &iov, &iter);
+	ret = import_iovec(WRITE, (const struct iovec __user *)vec, vlen,
+			   UIO_FASTIOV, &iov, &iter);
 	if (ret >= 0) {
 		file_start_write(file);
 		ret = do_iter_write(file, &iter, pos, flags);
diff --git a/fs/splice.c b/fs/splice.c
index d7c8a7c4db07ff..132d42b9871f9b 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1387,7 +1387,7 @@ COMPAT_SYSCALL_DEFINE4(vmsplice, int, fd, const struct compat_iovec __user *, io
 	if (error)
 		return error;
 
-	error = compat_import_iovec(type, iov32, nr_segs,
+	error = import_iovec(type, (struct iovec __user *)iov32, nr_segs,
 			     ARRAY_SIZE(iovstack), &iov, &iter);
 	if (error >= 0) {
 		error = do_vmsplice(f.file, &iter, flags);
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 3835a8a8e9eae0..fff5d49dd0d53e 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -269,13 +269,9 @@ size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
 ssize_t import_iovec(int type, const struct iovec __user * uvector,
 		 unsigned nr_segs, unsigned fast_segs,
 		 struct iovec **iov, struct iov_iter *i);
-
-#ifdef CONFIG_COMPAT
-struct compat_iovec;
-ssize_t compat_import_iovec(int type, const struct compat_iovec __user * uvector,
-		 unsigned nr_segs, unsigned fast_segs,
-		 struct iovec **iov, struct iov_iter *i);
-#endif
+ssize_t __import_iovec(int type, const struct iovec __user *uvector,
+		unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
+		struct iov_iter *i, bool compat);
 
 int import_single_range(int type, void __user *buf, size_t len,
 		 struct iovec *iov, struct iov_iter *i);
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 8bfa47b63d39aa..632265178d8737 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1683,7 +1683,7 @@ static int compat_copy_iovecs_from_user(struct iovec *iov,
 	return ret;
 }
 
-static ssize_t __import_iovec(int type, const struct iovec __user *uvector,
+ssize_t __import_iovec(int type, const struct iovec __user *uvector,
 		unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
 		struct iov_iter *i, bool compat)
 {
@@ -1793,22 +1793,11 @@ ssize_t import_iovec(int type, const struct iovec __user * uvector,
 		 unsigned nr_segs, unsigned fast_segs,
 		 struct iovec **iov, struct iov_iter *i)
 {
-	return __import_iovec(type, uvector, nr_segs, fast_segs, iov, i, false);
+	return __import_iovec(type, uvector, nr_segs, fast_segs, iov, i,
+			      in_compat_syscall());
 }
 EXPORT_SYMBOL(import_iovec);
 
-#ifdef CONFIG_COMPAT
-ssize_t compat_import_iovec(int type,
-		const struct compat_iovec __user * uvector,
-		unsigned nr_segs, unsigned fast_segs,
-		struct iovec **iov, struct iov_iter *i)
-{
-	return __import_iovec(type, (const struct iovec __user *)uvector,
-			      nr_segs, fast_segs, iov, i, true);
-}
-EXPORT_SYMBOL(compat_import_iovec);
-#endif
-
 int import_single_range(int rw, void __user *buf, size_t len,
 		 struct iovec *iov, struct iov_iter *i)
 {
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 40cd502c337534..b759ed264840d8 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -331,13 +331,14 @@ compat_process_vm_rw(compat_pid_t pid,
 	if (flags != 0)
 		return -EINVAL;
 
-	rc = compat_import_iovec(dir, lvec, liovcnt, UIO_FASTIOV, &iov_l, &iter_l);
+	rc = import_iovec(dir, (const struct iovec __user *)iov_l, liovcnt,
+			  UIO_FASTIOV, &iov_l, &iter_l);
 	if (rc < 0)
 		return rc;
 	if (!iov_iter_count(&iter_l))
 		goto free_iovecs;
-	rc = compat_import_iovec(CHECK_IOVEC_ONLY, rvec, riovcnt, UIO_FASTIOV,
-				 &iov_r, &iter_r);
+	rc = import_iovec(CHECK_IOVEC_ONLY, iov_r, riovcnt, UIO_FASTIOV, &iov_r,
+			  &iter_r);
 	if (rc <= 0)
 		goto free_iovecs;
 
diff --git a/net/compat.c b/net/compat.c
index 95ce707a30a31d..ddd15af3a2837b 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -98,8 +98,8 @@ int get_compat_msghdr(struct msghdr *kmsg,
 	if (err)
 		return err;
 
-	err = compat_import_iovec(save_addr ? READ : WRITE, compat_ptr(ptr),
-				   len, UIO_FASTIOV, iov, &kmsg->msg_iter);
+	err = import_iovec(save_addr ? READ : WRITE, compat_ptr(ptr), len,
+			   UIO_FASTIOV, iov, &kmsg->msg_iter);
 	return err < 0 ? err : 0;
 }
 
diff --git a/security/keys/compat.c b/security/keys/compat.c
index 6ee9d8f6a4a5bb..7ae531db031cf8 100644
--- a/security/keys/compat.c
+++ b/security/keys/compat.c
@@ -33,9 +33,8 @@ static long compat_keyctl_instantiate_key_iov(
 	if (!_payload_iov)
 		ioc = 0;
 
-	ret = compat_import_iovec(WRITE, _payload_iov, ioc,
-				  ARRAY_SIZE(iovstack), &iov,
-				  &from);
+	ret = import_iovec(WRITE, (const struct iovec __user *)_payload_iov,
+			   ioc, ARRAY_SIZE(iovstack), &iov, &from);
 	if (ret < 0)
 		return ret;
 
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 07/11] fs: remove various compat readv/writev helpers
  2020-09-21 14:34 ` let import_iovec deal with compat_iovecs as well v2 Christoph Hellwig
                     ` (5 preceding siblings ...)
  2020-09-21 14:34   ` [PATCH 06/11] iov_iter: handle the compat case in import_iovec Christoph Hellwig
@ 2020-09-21 14:34   ` Christoph Hellwig
  2020-09-21 14:34   ` [PATCH 08/11] fs: remove the compat readv/writev syscalls Christoph Hellwig
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-21 14:34 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Now that import_iovec handles compat iovecs as well, all the duplicated
code in the compat readv/writev helpers is not needed.  Remove them
and switch the compat syscall handlers to use the native helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 179 ++++++++----------------------------------------
 1 file changed, 30 insertions(+), 149 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 0a68037580b455..eab427b7cc0a3f 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1068,226 +1068,107 @@ SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
 	return do_pwritev(fd, vec, vlen, pos, flags);
 }
 
+/*
+ * Various compat syscalls.  Note that they all pretend to take a native
+ * iovec - import_iovec will properly treat those as compat_iovecs based on
+ * in_compat_syscall().
+ */
 #ifdef CONFIG_COMPAT
-static size_t compat_readv(struct file *file,
-			   const struct compat_iovec __user *vec,
-			   unsigned long vlen, loff_t *pos, rwf_t flags)
-{
-	struct iovec iovstack[UIO_FASTIOV];
-	struct iovec *iov = iovstack;
-	struct iov_iter iter;
-	ssize_t ret;
-
-	ret = import_iovec(READ, (const struct iovec __user *)vec, vlen,
-			   UIO_FASTIOV, &iov, &iter);
-	if (ret >= 0) {
-		ret = do_iter_read(file, &iter, pos, flags);
-		kfree(iov);
-	}
-	if (ret > 0)
-		add_rchar(current, ret);
-	inc_syscr(current);
-	return ret;
-}
-
-static size_t do_compat_readv(compat_ulong_t fd,
-				 const struct compat_iovec __user *vec,
-				 compat_ulong_t vlen, rwf_t flags)
-{
-	struct fd f = fdget_pos(fd);
-	ssize_t ret;
-	loff_t pos;
-
-	if (!f.file)
-		return -EBADF;
-	pos = f.file->f_pos;
-	ret = compat_readv(f.file, vec, vlen, &pos, flags);
-	if (ret >= 0)
-		f.file->f_pos = pos;
-	fdput_pos(f);
-	return ret;
-
-}
-
 COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		compat_ulong_t, vlen)
 {
-	return do_compat_readv(fd, vec, vlen, 0);
-}
-
-static long do_compat_preadv64(unsigned long fd,
-				  const struct compat_iovec __user *vec,
-				  unsigned long vlen, loff_t pos, rwf_t flags)
-{
-	struct fd f;
-	ssize_t ret;
-
-	if (pos < 0)
-		return -EINVAL;
-	f = fdget(fd);
-	if (!f.file)
-		return -EBADF;
-	ret = -ESPIPE;
-	if (f.file->f_mode & FMODE_PREAD)
-		ret = compat_readv(f.file, vec, vlen, &pos, flags);
-	fdput(f);
-	return ret;
+	return do_readv(fd, vec, vlen, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
 COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		unsigned long, vlen, loff_t, pos)
 {
-	return do_compat_preadv64(fd, vec, vlen, pos, 0);
+	return do_preadv(fd, vec, vlen, pos, 0);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		compat_ulong_t, vlen, u32, pos_low, u32, pos_high)
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-	return do_compat_preadv64(fd, vec, vlen, pos, 0);
+	return do_preadv(fd, vec, vlen, pos, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
 COMPAT_SYSCALL_DEFINE5(preadv64v2, unsigned long, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		unsigned long, vlen, loff_t, pos, rwf_t, flags)
 {
 	if (pos = -1)
-		return do_compat_readv(fd, vec, vlen, flags);
-
-	return do_compat_preadv64(fd, vec, vlen, pos, flags);
+		return do_readv(fd, vec, vlen, flags);
+	return do_preadv(fd, vec, vlen, pos, flags);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		compat_ulong_t, vlen, u32, pos_low, u32, pos_high,
 		rwf_t, flags)
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
 	if (pos = -1)
-		return do_compat_readv(fd, vec, vlen, flags);
-
-	return do_compat_preadv64(fd, vec, vlen, pos, flags);
-}
-
-static size_t compat_writev(struct file *file,
-			    const struct compat_iovec __user *vec,
-			    unsigned long vlen, loff_t *pos, rwf_t flags)
-{
-	struct iovec iovstack[UIO_FASTIOV];
-	struct iovec *iov = iovstack;
-	struct iov_iter iter;
-	ssize_t ret;
-
-	ret = import_iovec(WRITE, (const struct iovec __user *)vec, vlen,
-			   UIO_FASTIOV, &iov, &iter);
-	if (ret >= 0) {
-		file_start_write(file);
-		ret = do_iter_write(file, &iter, pos, flags);
-		file_end_write(file);
-		kfree(iov);
-	}
-	if (ret > 0)
-		add_wchar(current, ret);
-	inc_syscw(current);
-	return ret;
-}
-
-static size_t do_compat_writev(compat_ulong_t fd,
-				  const struct compat_iovec __user* vec,
-				  compat_ulong_t vlen, rwf_t flags)
-{
-	struct fd f = fdget_pos(fd);
-	ssize_t ret;
-	loff_t pos;
-
-	if (!f.file)
-		return -EBADF;
-	pos = f.file->f_pos;
-	ret = compat_writev(f.file, vec, vlen, &pos, flags);
-	if (ret >= 0)
-		f.file->f_pos = pos;
-	fdput_pos(f);
-	return ret;
+		return do_readv(fd, vec, vlen, flags);
+	return do_preadv(fd, vec, vlen, pos, flags);
 }
 
 COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
-		const struct compat_iovec __user *, vec,
+		const struct iovec __user *, vec,
 		compat_ulong_t, vlen)
 {
-	return do_compat_writev(fd, vec, vlen, 0);
-}
-
-static long do_compat_pwritev64(unsigned long fd,
-				   const struct compat_iovec __user *vec,
-				   unsigned long vlen, loff_t pos, rwf_t flags)
-{
-	struct fd f;
-	ssize_t ret;
-
-	if (pos < 0)
-		return -EINVAL;
-	f = fdget(fd);
-	if (!f.file)
-		return -EBADF;
-	ret = -ESPIPE;
-	if (f.file->f_mode & FMODE_PWRITE)
-		ret = compat_writev(f.file, vec, vlen, &pos, flags);
-	fdput(f);
-	return ret;
+	return do_writev(fd, vec, vlen, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PWRITEV64
 COMPAT_SYSCALL_DEFINE4(pwritev64, unsigned long, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		unsigned long, vlen, loff_t, pos)
 {
-	return do_compat_pwritev64(fd, vec, vlen, pos, 0);
+	return do_pwritev(fd, vec, vlen, pos, 0);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE5(pwritev, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *,vec,
 		compat_ulong_t, vlen, u32, pos_low, u32, pos_high)
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-	return do_compat_pwritev64(fd, vec, vlen, pos, 0);
+	return do_pwritev(fd, vec, vlen, pos, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PWRITEV64V2
 COMPAT_SYSCALL_DEFINE5(pwritev64v2, unsigned long, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		unsigned long, vlen, loff_t, pos, rwf_t, flags)
 {
 	if (pos = -1)
-		return do_compat_writev(fd, vec, vlen, flags);
-
-	return do_compat_pwritev64(fd, vec, vlen, pos, flags);
+		return do_writev(fd, vec, vlen, flags);
+	return do_pwritev(fd, vec, vlen, pos, flags);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE6(pwritev2, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *,vec,
 		compat_ulong_t, vlen, u32, pos_low, u32, pos_high, rwf_t, flags)
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
 	if (pos = -1)
-		return do_compat_writev(fd, vec, vlen, flags);
-
-	return do_compat_pwritev64(fd, vec, vlen, pos, flags);
+		return do_writev(fd, vec, vlen, flags);
+	return do_pwritev(fd, vec, vlen, pos, flags);
 }
-
-#endif
+#endif /* CONFIG_COMPAT */
 
 static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 		  	   size_t count, loff_t max)
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 08/11] fs: remove the compat readv/writev syscalls
  2020-09-21 14:34 ` let import_iovec deal with compat_iovecs as well v2 Christoph Hellwig
                     ` (6 preceding siblings ...)
  2020-09-21 14:34   ` [PATCH 07/11] fs: remove various compat readv/writev helpers Christoph Hellwig
@ 2020-09-21 14:34   ` Christoph Hellwig
  2020-09-21 14:34   ` [PATCH 09/11] fs: remove compat_sys_vmsplice Christoph Hellwig
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-21 14:34 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Now that import_iovec handles compat iovecs, the native readv and writev
syscalls can be used for the compat case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/include/asm/unistd32.h                  |  4 ++--
 arch/mips/kernel/syscalls/syscall_n32.tbl          |  4 ++--
 arch/mips/kernel/syscalls/syscall_o32.tbl          |  4 ++--
 arch/parisc/kernel/syscalls/syscall.tbl            |  4 ++--
 arch/powerpc/kernel/syscalls/syscall.tbl           |  4 ++--
 arch/s390/kernel/syscalls/syscall.tbl              |  4 ++--
 arch/sparc/kernel/syscalls/syscall.tbl             |  4 ++--
 arch/x86/entry/syscall_x32.c                       |  2 ++
 arch/x86/entry/syscalls/syscall_32.tbl             |  4 ++--
 arch/x86/entry/syscalls/syscall_64.tbl             |  4 ++--
 fs/read_write.c                                    | 14 --------------
 include/linux/compat.h                             |  4 ----
 include/uapi/asm-generic/unistd.h                  |  4 ++--
 tools/include/uapi/asm-generic/unistd.h            |  4 ++--
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl |  4 ++--
 tools/perf/arch/s390/entry/syscalls/syscall.tbl    |  4 ++--
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  |  4 ++--
 17 files changed, 30 insertions(+), 46 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 734860ac7cf9d5..4a236493dca5b9 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -301,9 +301,9 @@ __SYSCALL(__NR_flock, sys_flock)
 #define __NR_msync 144
 __SYSCALL(__NR_msync, sys_msync)
 #define __NR_readv 145
-__SYSCALL(__NR_readv, compat_sys_readv)
+__SYSCALL(__NR_readv, sys_readv)
 #define __NR_writev 146
-__SYSCALL(__NR_writev, compat_sys_writev)
+__SYSCALL(__NR_writev, sys_writev)
 #define __NR_getsid 147
 __SYSCALL(__NR_getsid, sys_getsid)
 #define __NR_fdatasync 148
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index f9df9edb67a407..c99a92646f8ee9 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -25,8 +25,8 @@
 15	n32	ioctl				compat_sys_ioctl
 16	n32	pread64				sys_pread64
 17	n32	pwrite64			sys_pwrite64
-18	n32	readv				compat_sys_readv
-19	n32	writev				compat_sys_writev
+18	n32	readv				sys_readv
+19	n32	writev				sys_writev
 20	n32	access				sys_access
 21	n32	pipe				sysm_pipe
 22	n32	_newselect			compat_sys_select
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 195b43cf27c848..075064d10661bf 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -156,8 +156,8 @@
 142	o32	_newselect			sys_select			compat_sys_select
 143	o32	flock				sys_flock
 144	o32	msync				sys_msync
-145	o32	readv				sys_readv			compat_sys_readv
-146	o32	writev				sys_writev			compat_sys_writev
+145	o32	readv				sys_readv
+146	o32	writev				sys_writev
 147	o32	cacheflush			sys_cacheflush
 148	o32	cachectl			sys_cachectl
 149	o32	sysmips				__sys_sysmips
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index def64d221cd4fb..192abde0001d9d 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -159,8 +159,8 @@
 142	common	_newselect		sys_select			compat_sys_select
 143	common	flock			sys_flock
 144	common	msync			sys_msync
-145	common	readv			sys_readv			compat_sys_readv
-146	common	writev			sys_writev			compat_sys_writev
+145	common	readv			sys_readv
+146	common	writev			sys_writev
 147	common	getsid			sys_getsid
 148	common	fdatasync		sys_fdatasync
 149	common	_sysctl			sys_ni_syscall
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index c2d737ff2e7bec..6f1e2ecf0edad9 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -193,8 +193,8 @@
 142	common	_newselect			sys_select			compat_sys_select
 143	common	flock				sys_flock
 144	common	msync				sys_msync
-145	common	readv				sys_readv			compat_sys_readv
-146	common	writev				sys_writev			compat_sys_writev
+145	common	readv				sys_readv
+146	common	writev				sys_writev
 147	common	getsid				sys_getsid
 148	common	fdatasync			sys_fdatasync
 149	nospu	_sysctl				sys_ni_syscall
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 10456bc936fb09..6101cf2e004cb4 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -134,8 +134,8 @@
 142  64		select			sys_select			-
 143  common	flock			sys_flock			sys_flock
 144  common	msync			sys_msync			sys_msync
-145  common	readv			sys_readv			compat_sys_readv
-146  common	writev			sys_writev			compat_sys_writev
+145  common	readv			sys_readv			sys_readv
+146  common	writev			sys_writev			sys_writev
 147  common	getsid			sys_getsid			sys_getsid
 148  common	fdatasync		sys_fdatasync			sys_fdatasync
 149  common	_sysctl			-				-
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 4af114e84f2022..a87ddb282ab16f 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -149,8 +149,8 @@
 117	common	getrusage		sys_getrusage			compat_sys_getrusage
 118	common	getsockopt		sys_getsockopt			sys_getsockopt
 119	common	getcwd			sys_getcwd
-120	common	readv			sys_readv			compat_sys_readv
-121	common	writev			sys_writev			compat_sys_writev
+120	common	readv			sys_readv
+121	common	writev			sys_writev
 122	common	settimeofday		sys_settimeofday		compat_sys_settimeofday
 123	32	fchown			sys_fchown16
 123	64	fchown			sys_fchown
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index 1583831f61a9df..aa321444a41f63 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -12,6 +12,8 @@
  * Reuse the 64-bit entry points for the x32 versions that occupy different
  * slots in the syscall table.
  */
+#define __x32_sys_readv		__x64_sys_readv
+#define __x32_sys_writev	__x64_sys_writev
 #define __x32_sys_getsockopt	__x64_sys_getsockopt
 #define __x32_sys_setsockopt	__x64_sys_setsockopt
 
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 9d11028736661b..54ab4beb517f25 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -156,8 +156,8 @@
 142	i386	_newselect		sys_select			compat_sys_select
 143	i386	flock			sys_flock
 144	i386	msync			sys_msync
-145	i386	readv			sys_readv			compat_sys_readv
-146	i386	writev			sys_writev			compat_sys_writev
+145	i386	readv			sys_readv
+146	i386	writev			sys_writev
 147	i386	getsid			sys_getsid
 148	i386	fdatasync		sys_fdatasync
 149	i386	_sysctl			sys_ni_syscall
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index f30d6ae9a6883c..b1e59957c5c51c 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -371,8 +371,8 @@
 512	x32	rt_sigaction		compat_sys_rt_sigaction
 513	x32	rt_sigreturn		compat_sys_x32_rt_sigreturn
 514	x32	ioctl			compat_sys_ioctl
-515	x32	readv			compat_sys_readv
-516	x32	writev			compat_sys_writev
+515	x32	readv			sys_readv
+516	x32	writev			sys_writev
 517	x32	recvfrom		compat_sys_recvfrom
 518	x32	sendmsg			compat_sys_sendmsg
 519	x32	recvmsg			compat_sys_recvmsg
diff --git a/fs/read_write.c b/fs/read_write.c
index eab427b7cc0a3f..6c13f744c34a38 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1074,13 +1074,6 @@ SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
  * in_compat_syscall().
  */
 #ifdef CONFIG_COMPAT
-COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
-		const struct iovec __user *, vec,
-		compat_ulong_t, vlen)
-{
-	return do_readv(fd, vec, vlen, 0);
-}
-
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
 COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
 		const struct iovec __user *, vec,
@@ -1122,13 +1115,6 @@ COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
 	return do_preadv(fd, vec, vlen, pos, flags);
 }
 
-COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
-		const struct iovec __user *, vec,
-		compat_ulong_t, vlen)
-{
-	return do_writev(fd, vec, vlen, 0);
-}
-
 #ifdef __ARCH_WANT_COMPAT_SYS_PWRITEV64
 COMPAT_SYSCALL_DEFINE4(pwritev64, unsigned long, fd,
 		const struct iovec __user *, vec,
diff --git a/include/linux/compat.h b/include/linux/compat.h
index b930de791ff16b..fa39b7a5488d70 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -545,10 +545,6 @@ asmlinkage long compat_sys_getdents(unsigned int fd,
 
 /* fs/read_write.c */
 asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int);
-asmlinkage ssize_t compat_sys_readv(compat_ulong_t fd,
-		const struct compat_iovec __user *vec, compat_ulong_t vlen);
-asmlinkage ssize_t compat_sys_writev(compat_ulong_t fd,
-		const struct compat_iovec __user *vec, compat_ulong_t vlen);
 /* No generic prototype for pread64 and pwrite64 */
 asmlinkage ssize_t compat_sys_preadv(compat_ulong_t fd,
 		const struct compat_iovec __user *vec,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 995b36c2ea7d8a..211c9eacbda6eb 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -207,9 +207,9 @@ __SYSCALL(__NR_read, sys_read)
 #define __NR_write 64
 __SYSCALL(__NR_write, sys_write)
 #define __NR_readv 65
-__SC_COMP(__NR_readv, sys_readv, compat_sys_readv)
+__SC_COMP(__NR_readv, sys_readv, sys_readv)
 #define __NR_writev 66
-__SC_COMP(__NR_writev, sys_writev, compat_sys_writev)
+__SC_COMP(__NR_writev, sys_writev, sys_writev)
 #define __NR_pread64 67
 __SC_COMP(__NR_pread64, sys_pread64, compat_sys_pread64)
 #define __NR_pwrite64 68
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index 995b36c2ea7d8a..211c9eacbda6eb 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -207,9 +207,9 @@ __SYSCALL(__NR_read, sys_read)
 #define __NR_write 64
 __SYSCALL(__NR_write, sys_write)
 #define __NR_readv 65
-__SC_COMP(__NR_readv, sys_readv, compat_sys_readv)
+__SC_COMP(__NR_readv, sys_readv, sys_readv)
 #define __NR_writev 66
-__SC_COMP(__NR_writev, sys_writev, compat_sys_writev)
+__SC_COMP(__NR_writev, sys_writev, sys_writev)
 #define __NR_pread64 67
 __SC_COMP(__NR_pread64, sys_pread64, compat_sys_pread64)
 #define __NR_pwrite64 68
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index 3ca6fe057a0b1f..46be68029587f9 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -189,8 +189,8 @@
 142	common	_newselect			sys_select			compat_sys_select
 143	common	flock				sys_flock
 144	common	msync				sys_msync
-145	common	readv				sys_readv			compat_sys_readv
-146	common	writev				sys_writev			compat_sys_writev
+145	common	readv				sys_readv
+146	common	writev				sys_writev
 147	common	getsid				sys_getsid
 148	common	fdatasync			sys_fdatasync
 149	nospu	_sysctl				sys_ni_syscall
diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
index 6a0bbea225db0d..fb5e61ce9d5838 100644
--- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
@@ -134,8 +134,8 @@
 142  64		select			sys_select			-
 143  common	flock			sys_flock			sys_flock
 144  common	msync			sys_msync			compat_sys_msync
-145  common	readv			sys_readv			compat_sys_readv
-146  common	writev			sys_writev			compat_sys_writev
+145  common	readv			sys_readv
+146  common	writev			sys_writev
 147  common	getsid			sys_getsid			sys_getsid
 148  common	fdatasync		sys_fdatasync			sys_fdatasync
 149  common	_sysctl			-				-
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
index f30d6ae9a6883c..b1e59957c5c51c 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -371,8 +371,8 @@
 512	x32	rt_sigaction		compat_sys_rt_sigaction
 513	x32	rt_sigreturn		compat_sys_x32_rt_sigreturn
 514	x32	ioctl			compat_sys_ioctl
-515	x32	readv			compat_sys_readv
-516	x32	writev			compat_sys_writev
+515	x32	readv			sys_readv
+516	x32	writev			sys_writev
 517	x32	recvfrom		compat_sys_recvfrom
 518	x32	sendmsg			compat_sys_sendmsg
 519	x32	recvmsg			compat_sys_recvmsg
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 09/11] fs: remove compat_sys_vmsplice
  2020-09-21 14:34 ` let import_iovec deal with compat_iovecs as well v2 Christoph Hellwig
                     ` (7 preceding siblings ...)
  2020-09-21 14:34   ` [PATCH 08/11] fs: remove the compat readv/writev syscalls Christoph Hellwig
@ 2020-09-21 14:34   ` Christoph Hellwig
  2020-09-21 14:34   ` [PATCH 10/11] mm: remove compat_process_vm_{readv,writev} Christoph Hellwig
  2020-09-21 14:34   ` [PATCH 11/11] security/keys: remove compat_keyctl_instantiate_key_iov Christoph Hellwig
  10 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-21 14:34 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Now that import_iovec handles compat iovecs, the native vmsplice syscall
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/include/asm/unistd32.h             |  2 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl     |  2 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl     |  2 +-
 arch/parisc/kernel/syscalls/syscall.tbl       |  2 +-
 arch/powerpc/kernel/syscalls/syscall.tbl      |  2 +-
 arch/s390/kernel/syscalls/syscall.tbl         |  2 +-
 arch/sparc/kernel/syscalls/syscall.tbl        |  2 +-
 arch/x86/entry/syscall_x32.c                  |  1 +
 arch/x86/entry/syscalls/syscall_32.tbl        |  2 +-
 arch/x86/entry/syscalls/syscall_64.tbl        |  2 +-
 fs/splice.c                                   | 57 +++++--------------
 include/linux/compat.h                        |  4 --
 include/uapi/asm-generic/unistd.h             |  2 +-
 tools/include/uapi/asm-generic/unistd.h       |  2 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  2 +-
 .../perf/arch/s390/entry/syscalls/syscall.tbl |  2 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl    |  2 +-
 17 files changed, 28 insertions(+), 62 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 4a236493dca5b9..11dfae3a8563bd 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -697,7 +697,7 @@ __SYSCALL(__NR_sync_file_range2, compat_sys_aarch32_sync_file_range2)
 #define __NR_tee 342
 __SYSCALL(__NR_tee, sys_tee)
 #define __NR_vmsplice 343
-__SYSCALL(__NR_vmsplice, compat_sys_vmsplice)
+__SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages 344
 __SYSCALL(__NR_move_pages, compat_sys_move_pages)
 #define __NR_getcpu 345
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index c99a92646f8ee9..5a39d4de0ac85b 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -278,7 +278,7 @@
 267	n32	splice				sys_splice
 268	n32	sync_file_range			sys_sync_file_range
 269	n32	tee				sys_tee
-270	n32	vmsplice			compat_sys_vmsplice
+270	n32	vmsplice			sys_vmsplice
 271	n32	move_pages			compat_sys_move_pages
 272	n32	set_robust_list			compat_sys_set_robust_list
 273	n32	get_robust_list			compat_sys_get_robust_list
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 075064d10661bf..136efc6b8c5444 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -318,7 +318,7 @@
 304	o32	splice				sys_splice
 305	o32	sync_file_range			sys_sync_file_range		sys32_sync_file_range
 306	o32	tee				sys_tee
-307	o32	vmsplice			sys_vmsplice			compat_sys_vmsplice
+307	o32	vmsplice			sys_vmsplice
 308	o32	move_pages			sys_move_pages			compat_sys_move_pages
 309	o32	set_robust_list			sys_set_robust_list		compat_sys_set_robust_list
 310	o32	get_robust_list			sys_get_robust_list		compat_sys_get_robust_list
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 192abde0001d9d..a9e184192caedd 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -330,7 +330,7 @@
 292	32	sync_file_range		parisc_sync_file_range
 292	64	sync_file_range		sys_sync_file_range
 293	common	tee			sys_tee
-294	common	vmsplice		sys_vmsplice			compat_sys_vmsplice
+294	common	vmsplice		sys_vmsplice
 295	common	move_pages		sys_move_pages			compat_sys_move_pages
 296	common	getcpu			sys_getcpu
 297	common	epoll_pwait		sys_epoll_pwait			compat_sys_epoll_pwait
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 6f1e2ecf0edad9..0d4985919ca34d 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -369,7 +369,7 @@
 282	common	unshare				sys_unshare
 283	common	splice				sys_splice
 284	common	tee				sys_tee
-285	common	vmsplice			sys_vmsplice			compat_sys_vmsplice
+285	common	vmsplice			sys_vmsplice
 286	common	openat				sys_openat			compat_sys_openat
 287	common	mkdirat				sys_mkdirat
 288	common	mknodat				sys_mknodat
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 6101cf2e004cb4..b5495a42814bd1 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -316,7 +316,7 @@
 306  common	splice			sys_splice			sys_splice
 307  common	sync_file_range		sys_sync_file_range		compat_sys_s390_sync_file_range
 308  common	tee			sys_tee				sys_tee
-309  common	vmsplice		sys_vmsplice			compat_sys_vmsplice
+309  common	vmsplice		sys_vmsplice			sys_vmsplice
 310  common	move_pages		sys_move_pages			compat_sys_move_pages
 311  common	getcpu			sys_getcpu			sys_getcpu
 312  common	epoll_pwait		sys_epoll_pwait			compat_sys_epoll_pwait
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index a87ddb282ab16f..f1810c1a35caa5 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -38,7 +38,7 @@
 23	64    	setuid			sys_setuid
 24	32	getuid			sys_getuid16
 24	64   	getuid			sys_getuid
-25	common	vmsplice		sys_vmsplice			compat_sys_vmsplice
+25	common	vmsplice		sys_vmsplice
 26	common	ptrace			sys_ptrace			compat_sys_ptrace
 27	common	alarm			sys_alarm
 28	common	sigaltstack		sys_sigaltstack			compat_sys_sigaltstack
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index aa321444a41f63..a4840b9d50ad14 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -16,6 +16,7 @@
 #define __x32_sys_writev	__x64_sys_writev
 #define __x32_sys_getsockopt	__x64_sys_getsockopt
 #define __x32_sys_setsockopt	__x64_sys_setsockopt
+#define __x32_sys_vmsplice	__x64_sys_vmsplice
 
 #define __SYSCALL_64(nr, sym)
 
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 54ab4beb517f25..0fb2f172581e51 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -327,7 +327,7 @@
 313	i386	splice			sys_splice
 314	i386	sync_file_range		sys_ia32_sync_file_range
 315	i386	tee			sys_tee
-316	i386	vmsplice		sys_vmsplice			compat_sys_vmsplice
+316	i386	vmsplice		sys_vmsplice
 317	i386	move_pages		sys_move_pages			compat_sys_move_pages
 318	i386	getcpu			sys_getcpu
 319	i386	epoll_pwait		sys_epoll_pwait
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index b1e59957c5c51c..642af919183de4 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -388,7 +388,7 @@
 529	x32	waitid			compat_sys_waitid
 530	x32	set_robust_list		compat_sys_set_robust_list
 531	x32	get_robust_list		compat_sys_get_robust_list
-532	x32	vmsplice		compat_sys_vmsplice
+532	x32	vmsplice		sys_vmsplice
 533	x32	move_pages		compat_sys_move_pages
 534	x32	preadv			compat_sys_preadv64
 535	x32	pwritev			compat_sys_pwritev64
diff --git a/fs/splice.c b/fs/splice.c
index 132d42b9871f9b..18d84544030b39 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -33,7 +33,6 @@
 #include <linux/security.h>
 #include <linux/gfp.h>
 #include <linux/socket.h>
-#include <linux/compat.h>
 #include <linux/sched/signal.h>
 
 #include "internal.h"
@@ -1332,20 +1331,6 @@ static int vmsplice_type(struct fd f, int *type)
  * Currently we punt and implement it as a normal copy, see pipe_to_user().
  *
  */
-static long do_vmsplice(struct file *f, struct iov_iter *iter, unsigned int flags)
-{
-	if (unlikely(flags & ~SPLICE_F_ALL))
-		return -EINVAL;
-
-	if (!iov_iter_count(iter))
-		return 0;
-
-	if (iov_iter_rw(iter) = WRITE)
-		return vmsplice_to_pipe(f, iter, flags);
-	else
-		return vmsplice_to_user(f, iter, flags);
-}
-
 SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
 		unsigned long, nr_segs, unsigned int, flags)
 {
@@ -1356,6 +1341,9 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
 	struct fd f;
 	int type;
 
+	if (unlikely(flags & ~SPLICE_F_ALL))
+		return -EINVAL;
+
 	f = fdget(fd);
 	error = vmsplice_type(f, &type);
 	if (error)
@@ -1363,40 +1351,21 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
 
 	error = import_iovec(type, uiov, nr_segs,
 			     ARRAY_SIZE(iovstack), &iov, &iter);
-	if (error >= 0) {
-		error = do_vmsplice(f.file, &iter, flags);
-		kfree(iov);
-	}
-	fdput(f);
-	return error;
-}
+	if (error < 0)
+		goto out_fdput;
 
-#ifdef CONFIG_COMPAT
-COMPAT_SYSCALL_DEFINE4(vmsplice, int, fd, const struct compat_iovec __user *, iov32,
-		    unsigned int, nr_segs, unsigned int, flags)
-{
-	struct iovec iovstack[UIO_FASTIOV];
-	struct iovec *iov = iovstack;
-	struct iov_iter iter;
-	ssize_t error;
-	struct fd f;
-	int type;
-
-	f = fdget(fd);
-	error = vmsplice_type(f, &type);
-	if (error)
-		return error;
+	if (!iov_iter_count(&iter))
+		error = 0;
+	else if (iov_iter_rw(&iter) = WRITE)
+		error = vmsplice_to_pipe(f.file, &iter, flags);
+	else
+		error = vmsplice_to_user(f.file, &iter, flags);
 
-	error = import_iovec(type, (struct iovec __user *)iov32, nr_segs,
-			     ARRAY_SIZE(iovstack), &iov, &iter);
-	if (error >= 0) {
-		error = do_vmsplice(f.file, &iter, flags);
-		kfree(iov);
-	}
+	kfree(iov);
+out_fdput:
 	fdput(f);
 	return error;
 }
-#endif
 
 SYSCALL_DEFINE6(splice, int, fd_in, loff_t __user *, off_in,
 		int, fd_out, loff_t __user *, off_out,
diff --git a/include/linux/compat.h b/include/linux/compat.h
index fa39b7a5488d70..14d8bf412bd02e 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -597,10 +597,6 @@ asmlinkage long compat_sys_signalfd4(int ufd,
 				     const compat_sigset_t __user *sigmask,
 				     compat_size_t sigsetsize, int flags);
 
-/* fs/splice.c */
-asmlinkage long compat_sys_vmsplice(int fd, const struct compat_iovec __user *,
-				    unsigned int nr_segs, unsigned int flags);
-
 /* fs/stat.c */
 asmlinkage long compat_sys_newfstatat(unsigned int dfd,
 				      const char __user *filename,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 211c9eacbda6eb..f2dcb0d5703014 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -237,7 +237,7 @@ __SC_COMP(__NR_signalfd4, sys_signalfd4, compat_sys_signalfd4)
 
 /* fs/splice.c */
 #define __NR_vmsplice 75
-__SC_COMP(__NR_vmsplice, sys_vmsplice, compat_sys_vmsplice)
+__SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_splice 76
 __SYSCALL(__NR_splice, sys_splice)
 #define __NR_tee 77
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index 211c9eacbda6eb..f2dcb0d5703014 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -237,7 +237,7 @@ __SC_COMP(__NR_signalfd4, sys_signalfd4, compat_sys_signalfd4)
 
 /* fs/splice.c */
 #define __NR_vmsplice 75
-__SC_COMP(__NR_vmsplice, sys_vmsplice, compat_sys_vmsplice)
+__SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_splice 76
 __SYSCALL(__NR_splice, sys_splice)
 #define __NR_tee 77
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index 46be68029587f9..26f0347c15118b 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -363,7 +363,7 @@
 282	common	unshare				sys_unshare
 283	common	splice				sys_splice
 284	common	tee				sys_tee
-285	common	vmsplice			sys_vmsplice			compat_sys_vmsplice
+285	common	vmsplice			sys_vmsplice
 286	common	openat				sys_openat			compat_sys_openat
 287	common	mkdirat				sys_mkdirat
 288	common	mknodat				sys_mknodat
diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
index fb5e61ce9d5838..02ad81f69bb7e3 100644
--- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
@@ -316,7 +316,7 @@
 306  common	splice			sys_splice			compat_sys_splice
 307  common	sync_file_range		sys_sync_file_range		compat_sys_s390_sync_file_range
 308  common	tee			sys_tee				compat_sys_tee
-309  common	vmsplice		sys_vmsplice			compat_sys_vmsplice
+309  common	vmsplice		sys_vmsplice			sys_vmsplice
 310  common	move_pages		sys_move_pages			compat_sys_move_pages
 311  common	getcpu			sys_getcpu			compat_sys_getcpu
 312  common	epoll_pwait		sys_epoll_pwait			compat_sys_epoll_pwait
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
index b1e59957c5c51c..642af919183de4 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -388,7 +388,7 @@
 529	x32	waitid			compat_sys_waitid
 530	x32	set_robust_list		compat_sys_set_robust_list
 531	x32	get_robust_list		compat_sys_get_robust_list
-532	x32	vmsplice		compat_sys_vmsplice
+532	x32	vmsplice		sys_vmsplice
 533	x32	move_pages		compat_sys_move_pages
 534	x32	preadv			compat_sys_preadv64
 535	x32	pwritev			compat_sys_pwritev64
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 10/11] mm: remove compat_process_vm_{readv,writev}
  2020-09-21 14:34 ` let import_iovec deal with compat_iovecs as well v2 Christoph Hellwig
                     ` (8 preceding siblings ...)
  2020-09-21 14:34   ` [PATCH 09/11] fs: remove compat_sys_vmsplice Christoph Hellwig
@ 2020-09-21 14:34   ` Christoph Hellwig
  2020-09-21 14:34   ` [PATCH 11/11] security/keys: remove compat_keyctl_instantiate_key_iov Christoph Hellwig
  10 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-21 14:34 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Now that import_iovec handles compat iovecs, the native syscalls
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/include/asm/unistd32.h             |  4 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl     |  4 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl     |  4 +-
 arch/parisc/kernel/syscalls/syscall.tbl       |  4 +-
 arch/powerpc/kernel/syscalls/syscall.tbl      |  4 +-
 arch/s390/kernel/syscalls/syscall.tbl         |  4 +-
 arch/sparc/kernel/syscalls/syscall.tbl        |  4 +-
 arch/x86/entry/syscall_x32.c                  |  2 +
 arch/x86/entry/syscalls/syscall_32.tbl        |  4 +-
 arch/x86/entry/syscalls/syscall_64.tbl        |  4 +-
 include/linux/compat.h                        |  8 ---
 include/uapi/asm-generic/unistd.h             |  6 +-
 mm/process_vm_access.c                        | 71 -------------------
 tools/include/uapi/asm-generic/unistd.h       |  6 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  4 +-
 .../perf/arch/s390/entry/syscalls/syscall.tbl |  4 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl    |  4 +-
 17 files changed, 30 insertions(+), 111 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 11dfae3a8563bd..0c280a05f699bf 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -763,9 +763,9 @@ __SYSCALL(__NR_sendmmsg, compat_sys_sendmmsg)
 #define __NR_setns 375
 __SYSCALL(__NR_setns, sys_setns)
 #define __NR_process_vm_readv 376
-__SYSCALL(__NR_process_vm_readv, compat_sys_process_vm_readv)
+__SYSCALL(__NR_process_vm_readv, sys_process_vm_readv)
 #define __NR_process_vm_writev 377
-__SYSCALL(__NR_process_vm_writev, compat_sys_process_vm_writev)
+__SYSCALL(__NR_process_vm_writev, sys_process_vm_writev)
 #define __NR_kcmp 378
 __SYSCALL(__NR_kcmp, sys_kcmp)
 #define __NR_finit_module 379
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 5a39d4de0ac85b..0bc2e0fcf1ee56 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -317,8 +317,8 @@
 306	n32	syncfs				sys_syncfs
 307	n32	sendmmsg			compat_sys_sendmmsg
 308	n32	setns				sys_setns
-309	n32	process_vm_readv		compat_sys_process_vm_readv
-310	n32	process_vm_writev		compat_sys_process_vm_writev
+309	n32	process_vm_readv		sys_process_vm_readv
+310	n32	process_vm_writev		sys_process_vm_writev
 311	n32	kcmp				sys_kcmp
 312	n32	finit_module			sys_finit_module
 313	n32	sched_setattr			sys_sched_setattr
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 136efc6b8c5444..b408c13b934296 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -356,8 +356,8 @@
 342	o32	syncfs				sys_syncfs
 343	o32	sendmmsg			sys_sendmmsg			compat_sys_sendmmsg
 344	o32	setns				sys_setns
-345	o32	process_vm_readv		sys_process_vm_readv		compat_sys_process_vm_readv
-346	o32	process_vm_writev		sys_process_vm_writev		compat_sys_process_vm_writev
+345	o32	process_vm_readv		sys_process_vm_readv
+346	o32	process_vm_writev		sys_process_vm_writev
 347	o32	kcmp				sys_kcmp
 348	o32	finit_module			sys_finit_module
 349	o32	sched_setattr			sys_sched_setattr
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index a9e184192caedd..2015a5124b78ad 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -372,8 +372,8 @@
 327	common	syncfs			sys_syncfs
 328	common	setns			sys_setns
 329	common	sendmmsg		sys_sendmmsg			compat_sys_sendmmsg
-330	common	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-331	common	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+330	common	process_vm_readv	sys_process_vm_readv
+331	common	process_vm_writev	sys_process_vm_writev
 332	common	kcmp			sys_kcmp
 333	common	finit_module		sys_finit_module
 334	common	sched_setattr		sys_sched_setattr
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 0d4985919ca34d..66a472aa635d3f 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -449,8 +449,8 @@
 348	common	syncfs				sys_syncfs
 349	common	sendmmsg			sys_sendmmsg			compat_sys_sendmmsg
 350	common	setns				sys_setns
-351	nospu	process_vm_readv		sys_process_vm_readv		compat_sys_process_vm_readv
-352	nospu	process_vm_writev		sys_process_vm_writev		compat_sys_process_vm_writev
+351	nospu	process_vm_readv		sys_process_vm_readv
+352	nospu	process_vm_writev		sys_process_vm_writev
 353	nospu	finit_module			sys_finit_module
 354	nospu	kcmp				sys_kcmp
 355	common	sched_setattr			sys_sched_setattr
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index b5495a42814bd1..7485867a490bb2 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -347,8 +347,8 @@
 337  common	clock_adjtime		sys_clock_adjtime		sys_clock_adjtime32
 338  common	syncfs			sys_syncfs			sys_syncfs
 339  common	setns			sys_setns			sys_setns
-340  common	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-341  common	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+340  common	process_vm_readv	sys_process_vm_readv		sys_process_vm_readv
+341  common	process_vm_writev	sys_process_vm_writev		sys_process_vm_writev
 342  common	s390_runtime_instr	sys_s390_runtime_instr		sys_s390_runtime_instr
 343  common	kcmp			sys_kcmp			sys_kcmp
 344  common	finit_module		sys_finit_module		sys_finit_module
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index f1810c1a35caa5..4a9365b2e340b2 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -406,8 +406,8 @@
 335	common	syncfs			sys_syncfs
 336	common	sendmmsg		sys_sendmmsg			compat_sys_sendmmsg
 337	common	setns			sys_setns
-338	common	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-339	common	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+338	common	process_vm_readv	sys_process_vm_readv
+339	common	process_vm_writev	sys_process_vm_writev
 340	32	kern_features		sys_ni_syscall			sys_kern_features
 340	64	kern_features		sys_kern_features
 341	common	kcmp			sys_kcmp
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index a4840b9d50ad14..f2fe0a33bcfdd5 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -17,6 +17,8 @@
 #define __x32_sys_getsockopt	__x64_sys_getsockopt
 #define __x32_sys_setsockopt	__x64_sys_setsockopt
 #define __x32_sys_vmsplice	__x64_sys_vmsplice
+#define __x32_sys_process_vm_readv	__x64_sys_process_vm_readv
+#define __x32_sys_process_vm_writev	__x64_sys_process_vm_writev
 
 #define __SYSCALL_64(nr, sym)
 
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 0fb2f172581e51..5fbe10ad8a23fc 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -358,8 +358,8 @@
 344	i386	syncfs			sys_syncfs
 345	i386	sendmmsg		sys_sendmmsg			compat_sys_sendmmsg
 346	i386	setns			sys_setns
-347	i386	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-348	i386	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+347	i386	process_vm_readv	sys_process_vm_readv
+348	i386	process_vm_writev	sys_process_vm_writev
 349	i386	kcmp			sys_kcmp
 350	i386	finit_module		sys_finit_module
 351	i386	sched_setattr		sys_sched_setattr
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 642af919183de4..347809649ba28f 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -395,8 +395,8 @@
 536	x32	rt_tgsigqueueinfo	compat_sys_rt_tgsigqueueinfo
 537	x32	recvmmsg		compat_sys_recvmmsg_time64
 538	x32	sendmmsg		compat_sys_sendmmsg
-539	x32	process_vm_readv	compat_sys_process_vm_readv
-540	x32	process_vm_writev	compat_sys_process_vm_writev
+539	x32	process_vm_readv	sys_process_vm_readv
+540	x32	process_vm_writev	sys_process_vm_writev
 541	x32	setsockopt		sys_setsockopt
 542	x32	getsockopt		sys_getsockopt
 543	x32	io_setup		compat_sys_io_setup
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 14d8bf412bd02e..d46e5905970817 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -780,14 +780,6 @@ asmlinkage long compat_sys_open_by_handle_at(int mountdirfd,
 					     int flags);
 asmlinkage long compat_sys_sendmmsg(int fd, struct compat_mmsghdr __user *mmsg,
 				    unsigned vlen, unsigned int flags);
-asmlinkage ssize_t compat_sys_process_vm_readv(compat_pid_t pid,
-		const struct compat_iovec __user *lvec,
-		compat_ulong_t liovcnt, const struct compat_iovec __user *rvec,
-		compat_ulong_t riovcnt, compat_ulong_t flags);
-asmlinkage ssize_t compat_sys_process_vm_writev(compat_pid_t pid,
-		const struct compat_iovec __user *lvec,
-		compat_ulong_t liovcnt, const struct compat_iovec __user *rvec,
-		compat_ulong_t riovcnt, compat_ulong_t flags);
 asmlinkage long compat_sys_execveat(int dfd, const char __user *filename,
 		     const compat_uptr_t __user *argv,
 		     const compat_uptr_t __user *envp, int flags);
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index f2dcb0d5703014..c1dfe99c9c3f70 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -727,11 +727,9 @@ __SYSCALL(__NR_setns, sys_setns)
 #define __NR_sendmmsg 269
 __SC_COMP(__NR_sendmmsg, sys_sendmmsg, compat_sys_sendmmsg)
 #define __NR_process_vm_readv 270
-__SC_COMP(__NR_process_vm_readv, sys_process_vm_readv, \
-          compat_sys_process_vm_readv)
+__SYSCALL(__NR_process_vm_readv, sys_process_vm_readv)
 #define __NR_process_vm_writev 271
-__SC_COMP(__NR_process_vm_writev, sys_process_vm_writev, \
-          compat_sys_process_vm_writev)
+__SYSCALL(__NR_process_vm_writev, sys_process_vm_writev)
 #define __NR_kcmp 272
 __SYSCALL(__NR_kcmp, sys_kcmp)
 #define __NR_finit_module 273
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index b759ed264840d8..a5f49ae45643fc 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -14,10 +14,6 @@
 #include <linux/slab.h>
 #include <linux/syscalls.h>
 
-#ifdef CONFIG_COMPAT
-#include <linux/compat.h>
-#endif
-
 /**
  * process_vm_rw_pages - read/write pages from task specified
  * @pages: array of pointers to pages we want to copy
@@ -309,70 +305,3 @@ SYSCALL_DEFINE6(process_vm_writev, pid_t, pid,
 {
 	return process_vm_rw(pid, lvec, liovcnt, rvec, riovcnt, flags, 1);
 }
-
-#ifdef CONFIG_COMPAT
-
-static ssize_t
-compat_process_vm_rw(compat_pid_t pid,
-		     const struct compat_iovec __user *lvec,
-		     unsigned long liovcnt,
-		     const struct compat_iovec __user *rvec,
-		     unsigned long riovcnt,
-		     unsigned long flags, int vm_write)
-{
-	struct iovec iovstack_l[UIO_FASTIOV];
-	struct iovec iovstack_r[UIO_FASTIOV];
-	struct iovec *iov_l = iovstack_l;
-	struct iovec *iov_r = iovstack_r;
-	struct iov_iter iter_l, iter_r;
-	ssize_t rc = -EFAULT;
-	int dir = vm_write ? WRITE : READ;
-
-	if (flags != 0)
-		return -EINVAL;
-
-	rc = import_iovec(dir, (const struct iovec __user *)iov_l, liovcnt,
-			  UIO_FASTIOV, &iov_l, &iter_l);
-	if (rc < 0)
-		return rc;
-	if (!iov_iter_count(&iter_l))
-		goto free_iovecs;
-	rc = import_iovec(CHECK_IOVEC_ONLY, iov_r, riovcnt, UIO_FASTIOV, &iov_r,
-			  &iter_r);
-	if (rc <= 0)
-		goto free_iovecs;
-
-	rc = process_vm_rw_core(pid, &iter_l, iter_r.iov, iter_r.nr_segs,
-				flags, vm_write);
-
-free_iovecs:
-	if (iov_r != iovstack_r)
-		kfree(iov_r);
-	if (iov_l != iovstack_l)
-		kfree(iov_l);
-	return rc;
-}
-
-COMPAT_SYSCALL_DEFINE6(process_vm_readv, compat_pid_t, pid,
-		       const struct compat_iovec __user *, lvec,
-		       compat_ulong_t, liovcnt,
-		       const struct compat_iovec __user *, rvec,
-		       compat_ulong_t, riovcnt,
-		       compat_ulong_t, flags)
-{
-	return compat_process_vm_rw(pid, lvec, liovcnt, rvec,
-				    riovcnt, flags, 0);
-}
-
-COMPAT_SYSCALL_DEFINE6(process_vm_writev, compat_pid_t, pid,
-		       const struct compat_iovec __user *, lvec,
-		       compat_ulong_t, liovcnt,
-		       const struct compat_iovec __user *, rvec,
-		       compat_ulong_t, riovcnt,
-		       compat_ulong_t, flags)
-{
-	return compat_process_vm_rw(pid, lvec, liovcnt, rvec,
-				    riovcnt, flags, 1);
-}
-
-#endif
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index f2dcb0d5703014..c1dfe99c9c3f70 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -727,11 +727,9 @@ __SYSCALL(__NR_setns, sys_setns)
 #define __NR_sendmmsg 269
 __SC_COMP(__NR_sendmmsg, sys_sendmmsg, compat_sys_sendmmsg)
 #define __NR_process_vm_readv 270
-__SC_COMP(__NR_process_vm_readv, sys_process_vm_readv, \
-          compat_sys_process_vm_readv)
+__SYSCALL(__NR_process_vm_readv, sys_process_vm_readv)
 #define __NR_process_vm_writev 271
-__SC_COMP(__NR_process_vm_writev, sys_process_vm_writev, \
-          compat_sys_process_vm_writev)
+__SYSCALL(__NR_process_vm_writev, sys_process_vm_writev)
 #define __NR_kcmp 272
 __SYSCALL(__NR_kcmp, sys_kcmp)
 #define __NR_finit_module 273
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index 26f0347c15118b..a188f053cbf90a 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -443,8 +443,8 @@
 348	common	syncfs				sys_syncfs
 349	common	sendmmsg			sys_sendmmsg			compat_sys_sendmmsg
 350	common	setns				sys_setns
-351	nospu	process_vm_readv		sys_process_vm_readv		compat_sys_process_vm_readv
-352	nospu	process_vm_writev		sys_process_vm_writev		compat_sys_process_vm_writev
+351	nospu	process_vm_readv		sys_process_vm_readv
+352	nospu	process_vm_writev		sys_process_vm_writev
 353	nospu	finit_module			sys_finit_module
 354	nospu	kcmp				sys_kcmp
 355	common	sched_setattr			sys_sched_setattr
diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
index 02ad81f69bb7e3..c44c83032c3a04 100644
--- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
@@ -347,8 +347,8 @@
 337  common	clock_adjtime		sys_clock_adjtime		compat_sys_clock_adjtime
 338  common	syncfs			sys_syncfs			sys_syncfs
 339  common	setns			sys_setns			sys_setns
-340  common	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-341  common	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+340  common	process_vm_readv	sys_process_vm_readv		sys_process_vm_readv
+341  common	process_vm_writev	sys_process_vm_writev		sys_process_vm_writev
 342  common	s390_runtime_instr	sys_s390_runtime_instr		sys_s390_runtime_instr
 343  common	kcmp			sys_kcmp			compat_sys_kcmp
 344  common	finit_module		sys_finit_module		compat_sys_finit_module
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
index 642af919183de4..347809649ba28f 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -395,8 +395,8 @@
 536	x32	rt_tgsigqueueinfo	compat_sys_rt_tgsigqueueinfo
 537	x32	recvmmsg		compat_sys_recvmmsg_time64
 538	x32	sendmmsg		compat_sys_sendmmsg
-539	x32	process_vm_readv	compat_sys_process_vm_readv
-540	x32	process_vm_writev	compat_sys_process_vm_writev
+539	x32	process_vm_readv	sys_process_vm_readv
+540	x32	process_vm_writev	sys_process_vm_writev
 541	x32	setsockopt		sys_setsockopt
 542	x32	getsockopt		sys_getsockopt
 543	x32	io_setup		compat_sys_io_setup
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 11/11] security/keys: remove compat_keyctl_instantiate_key_iov
  2020-09-21 14:34 ` let import_iovec deal with compat_iovecs as well v2 Christoph Hellwig
                     ` (9 preceding siblings ...)
  2020-09-21 14:34   ` [PATCH 10/11] mm: remove compat_process_vm_{readv,writev} Christoph Hellwig
@ 2020-09-21 14:34   ` Christoph Hellwig
  10 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-21 14:34 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Now that import_iovec handles compat iovecs, the native version of
keyctl_instantiate_key_iov can be used for the compat case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 security/keys/compat.c   | 36 ++----------------------------------
 security/keys/internal.h |  5 -----
 security/keys/keyctl.c   |  2 +-
 3 files changed, 3 insertions(+), 40 deletions(-)

diff --git a/security/keys/compat.c b/security/keys/compat.c
index 7ae531db031cf8..1545efdca56227 100644
--- a/security/keys/compat.c
+++ b/security/keys/compat.c
@@ -11,38 +11,6 @@
 #include <linux/slab.h>
 #include "internal.h"
 
-/*
- * Instantiate a key with the specified compatibility multipart payload and
- * link the key into the destination keyring if one is given.
- *
- * The caller must have the appropriate instantiation permit set for this to
- * work (see keyctl_assume_authority).  No other permissions are required.
- *
- * If successful, 0 will be returned.
- */
-static long compat_keyctl_instantiate_key_iov(
-	key_serial_t id,
-	const struct compat_iovec __user *_payload_iov,
-	unsigned ioc,
-	key_serial_t ringid)
-{
-	struct iovec iovstack[UIO_FASTIOV], *iov = iovstack;
-	struct iov_iter from;
-	long ret;
-
-	if (!_payload_iov)
-		ioc = 0;
-
-	ret = import_iovec(WRITE, (const struct iovec __user *)_payload_iov,
-			   ioc, ARRAY_SIZE(iovstack), &iov, &from);
-	if (ret < 0)
-		return ret;
-
-	ret = keyctl_instantiate_key_common(id, &from, ringid);
-	kfree(iov);
-	return ret;
-}
-
 /*
  * The key control system call, 32-bit compatibility version for 64-bit archs
  */
@@ -113,8 +81,8 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option,
 		return keyctl_reject_key(arg2, arg3, arg4, arg5);
 
 	case KEYCTL_INSTANTIATE_IOV:
-		return compat_keyctl_instantiate_key_iov(
-			arg2, compat_ptr(arg3), arg4, arg5);
+		return keyctl_instantiate_key_iov(arg2, compat_ptr(arg3), arg4,
+						  arg5);
 
 	case KEYCTL_INVALIDATE:
 		return keyctl_invalidate_key(arg2);
diff --git a/security/keys/internal.h b/security/keys/internal.h
index 338a526cbfa516..9b9cf3b6fcbb4d 100644
--- a/security/keys/internal.h
+++ b/security/keys/internal.h
@@ -262,11 +262,6 @@ extern long keyctl_instantiate_key_iov(key_serial_t,
 				       const struct iovec __user *,
 				       unsigned, key_serial_t);
 extern long keyctl_invalidate_key(key_serial_t);
-
-struct iov_iter;
-extern long keyctl_instantiate_key_common(key_serial_t,
-					  struct iov_iter *,
-					  key_serial_t);
 extern long keyctl_restrict_keyring(key_serial_t id,
 				    const char __user *_type,
 				    const char __user *_restriction);
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index 9febd37a168fd0..e26bbccda7ccee 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -1164,7 +1164,7 @@ static int keyctl_change_reqkey_auth(struct key *key)
  *
  * If successful, 0 will be returned.
  */
-long keyctl_instantiate_key_common(key_serial_t id,
+static long keyctl_instantiate_key_common(key_serial_t id,
 				   struct iov_iter *from,
 				   key_serial_t ringid)
 {
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 02/11] mm: call import_iovec() instead of rw_copy_check_uvector() in process_vm_rw()
  2020-09-21 14:34   ` [PATCH 02/11] mm: call import_iovec() instead of rw_copy_check_uvector() in process_vm_rw() Christoph Hellwig
@ 2020-09-21 14:48     ` Matthew Wilcox
  2020-09-21 15:02     ` Al Viro
  1 sibling, 0 replies; 171+ messages in thread
From: Matthew Wilcox @ 2020-09-21 14:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alexander Viro, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, David Laight, linux-arm-kernel, linux-kernel,
	linux-mips, linux-parisc, linuxppc-dev, linux-s390, sparclinux,
	linux-block, linux-scsi, linux-fsdevel, linux-aio, io-uring,
	linux-arch, linux-mm, netdev, keyrings, linux-security-module

On Mon, Sep 21, 2020 at 04:34:25PM +0200, Christoph Hellwig wrote:
>  {
> -	WARN_ON(direction & ~(READ | WRITE));
> +	WARN_ON(direction & ~(READ | WRITE | CHECK_IOVEC_ONLY));

This is now a no-op because:

include/linux/fs.h:#define CHECK_IOVEC_ONLY -1

I'd suggest we renumber it to 2?

(READ is 0, WRITE is 1.  This WARN_ON should probably be
	WARN_ON(direction > CHECK_IOVEC_ONLY)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 02/11] mm: call import_iovec() instead of rw_copy_check_uvector() in process_vm_rw()
  2020-09-21 14:34   ` [PATCH 02/11] mm: call import_iovec() instead of rw_copy_check_uvector() in process_vm_rw() Christoph Hellwig
  2020-09-21 14:48     ` Matthew Wilcox
@ 2020-09-21 15:02     ` Al Viro
  2020-09-21 15:21       ` David Laight
  1 sibling, 1 reply; 171+ messages in thread
From: Al Viro @ 2020-09-21 15:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Mon, Sep 21, 2020 at 04:34:25PM +0200, Christoph Hellwig wrote:
> From: David Laight <David.Laight@ACULAB.COM>
> 
> This is the only direct call of rw_copy_check_uvector().  Removing it
> will allow rw_copy_check_uvector() to be inlined into import_iovec(),
> while only paying a minor price by setting up an otherwise unused
> iov_iter in the process_vm_readv/process_vm_writev syscalls that aren't
> in a super hot path.

> @@ -443,7 +443,7 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
>  			const struct iovec *iov, unsigned long nr_segs,
>  			size_t count)
>  {
> -	WARN_ON(direction & ~(READ | WRITE));
> +	WARN_ON(direction & ~(READ | WRITE | CHECK_IOVEC_ONLY));
>  	direction &= READ | WRITE;

Ugh...

> -	rc = rw_copy_check_uvector(CHECK_IOVEC_ONLY, rvec, riovcnt, UIO_FASTIOV,
> -				   iovstack_r, &iov_r);
> +	rc = import_iovec(CHECK_IOVEC_ONLY, rvec, riovcnt, UIO_FASTIOV, &iov_r,
> +			  &iter_r);
>  	if (rc <= 0)
>  		goto free_iovecs;
>  
> -	rc = process_vm_rw_core(pid, &iter, iov_r, riovcnt, flags, vm_write);
> +	rc = process_vm_rw_core(pid, &iter_l, iter_r.iov, iter_r.nr_segs,
> +				flags, vm_write);

... and ugh^2, since now you are not only setting a meaningless iov_iter,
you are creating a new place that pokes directly into struct iov_iter
guts.

Sure, moving rw_copy_check_uvector() over to lib/iov_iter.c makes sense.
But I would rather split the access_ok()-related checks out of that thing
and bury CHECK_IOVEC_ONLY.

Step 1: move the damn thing to lib/iov_iter.c (same as you do, but without
making it static)

Step 2: split it in two:

ssize_t rw_copy_check_uvector(const struct iovec __user * uvector,
                              unsigned long nr_segs, unsigned long fast_segs,
                              struct iovec *fast_pointer,
                              struct iovec **ret_pointer)
{
	unsigned long seg;
	ssize_t ret;
	struct iovec *iov = fast_pointer;

	*ret_pointer = fast_pointer;
	/*
	 * SuS says "The readv() function *may* fail if the iovcnt argument
	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
	 * traditionally returned zero for zero segments, so...
	 */
	if (nr_segs = 0)
		return 0;

	/*
	 * First get the "struct iovec" from user memory and
	 * verify all the pointers
	 */
	if (nr_segs > UIO_MAXIOV)
		return -EINVAL;

	if (nr_segs > fast_segs) {
		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
		if (!iov)
			return -ENOMEM;
		*ret_pointer = iov;
	}
	if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector)))
		return -EFAULT;

	/*
	 * According to the Single Unix Specification we should return EINVAL
	 * if an element length is < 0 when cast to ssize_t or if the
	 * total length would overflow the ssize_t return value of the
	 * system call.
	 *
	 * Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
	 * overflow case.
	 */
	ret = 0;
	for (seg = 0; seg < nr_segs; seg++) {
		void __user *buf = iov[seg].iov_base;
		ssize_t len = (ssize_t)iov[seg].iov_len;

		/* see if we we're about to use an invalid len or if
		 * it's about to overflow ssize_t */
		if (len < 0)
			return -EINVAL;
		if (len > MAX_RW_COUNT - ret) {
			len = MAX_RW_COUNT - ret;
			iov[seg].iov_len = len;
		}
		ret += len;
	}
	return ret;
}

/*
 *  This is merely an early sanity check; we do _not_ rely upon
 *  it when we get to the actual memory accesses.
 */
static bool check_iovecs(const struct iovec *iov, int nr_segs)
{
        for (seg = 0; seg < nr_segs; seg++) {
                void __user *buf = iov[seg].iov_base;
                ssize_t len = (ssize_t)iov[seg].iov_len;

                if (unlikely(!access_ok(buf, len)))
                        return false;
        }
	return true;
}

ssize_t import_iovec(int type, const struct iovec __user * uvector,
                 unsigned nr_segs, unsigned fast_segs,
                 struct iovec **iov, struct iov_iter *i)
{
	struct iovec *p;
	ssize_t n;

	n = rw_copy_check_uvector(uvector, nr_segs, fast_segs, *iov, &p);
	if (n > 0 && !check_iovecs(p, nr_segs))
		n = -EFAULT;
	if (n < 0) {
		if (p != *iov)
			kfree(p);
		*iov = NULL;
		return n;
	}
	iov_iter_init(i, type, p, nr_segs, n);
	*iov = p = *iov ? NULL : p;
	return n;
}

kill CHECK_IOVEC_ONLY and use rw_copy_check_uvector() without the type
argument in mm/process_vm_access.c

Saner that way, IMO...

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH 04/11] iov_iter: explicitly check for CHECK_IOVEC_ONLY in rw_copy_check_uvector
  2020-09-21 14:34   ` [PATCH 04/11] iov_iter: explicitly check for CHECK_IOVEC_ONLY in rw_copy_check_uvector Christoph Hellwig
@ 2020-09-21 15:05     ` David Laight
  2020-09-21 15:11       ` Al Viro
  2020-09-21 15:07     ` Al Viro
  1 sibling, 1 reply; 171+ messages in thread
From: David Laight @ 2020-09-21 15:05 UTC (permalink / raw)
  To: 'Christoph Hellwig', Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

From: Christoph Hellwig
> Sent: 21 September 2020 15:34
> 
> Explicitly check for the magic value insted of implicitly relying on
> its numeric representation.   Also drop the rather pointless unlikely
> annotation.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  lib/iov_iter.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index d7e72343c360eb..a64867501a7483 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1709,8 +1709,7 @@ static ssize_t rw_copy_check_uvector(int type,
>  			ret = -EINVAL;
>  			goto out;
>  		}
> -		if (type >= 0
> -		    && unlikely(!access_ok(buf, len))) {
> +		if (type != CHECK_IOVEC_ONLY && !access_ok(buf, len)) {
>  			ret = -EFAULT;
>  			goto out;
>  		}
> @@ -1824,7 +1823,7 @@ static ssize_t compat_rw_copy_check_uvector(int type,
>  		}
>  		if (len < 0)	/* size_t not fitting in compat_ssize_t .. */
>  			goto out;
> -		if (type >= 0 &&
> +		if (type != CHECK_IOVEC_ONLY &&
>  		    !access_ok(compat_ptr(buf), len)) {
>  			ret = -EFAULT;
>  			goto out;
> --
> 2.28.0

I've actually no idea:
1) Why there is an access_ok() check here.
   It will be repeated by the user copy functions.
2) Why it isn't done when called from mm/process_vm_access.c.
   Ok, the addresses refer to a different process, but they
   must still be valid user addresses.

Is 2 a legacy from when access_ok() actually checked that the
addresses were mapped into the process's address space?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 04/11] iov_iter: explicitly check for CHECK_IOVEC_ONLY in rw_copy_check_uvector
  2020-09-21 14:34   ` [PATCH 04/11] iov_iter: explicitly check for CHECK_IOVEC_ONLY in rw_copy_check_uvector Christoph Hellwig
  2020-09-21 15:05     ` David Laight
@ 2020-09-21 15:07     ` Al Viro
  1 sibling, 0 replies; 171+ messages in thread
From: Al Viro @ 2020-09-21 15:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Mon, Sep 21, 2020 at 04:34:27PM +0200, Christoph Hellwig wrote:
> Explicitly check for the magic value insted of implicitly relying on
> its numeric representation.   Also drop the rather pointless unlikely
> annotation.

See above - I would rather have CHECK_IOVEC_ONLY gone.

The reason for doing these access_ok() in the same loop is fairly weak,
especially since you are checking type on each iteration.  Might as
well do that in a separate loop afterwards.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 04/11] iov_iter: explicitly check for CHECK_IOVEC_ONLY in rw_copy_check_uvector
  2020-09-21 15:05     ` David Laight
@ 2020-09-21 15:11       ` Al Viro
  2020-09-21 15:26         ` David Laight
  0 siblings, 1 reply; 171+ messages in thread
From: Al Viro @ 2020-09-21 15:11 UTC (permalink / raw)
  To: David Laight
  Cc: 'Christoph Hellwig',
	Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

On Mon, Sep 21, 2020 at 03:05:32PM +0000, David Laight wrote:

> I've actually no idea:
> 1) Why there is an access_ok() check here.
>    It will be repeated by the user copy functions.

Early sanity check.

> 2) Why it isn't done when called from mm/process_vm_access.c.
>    Ok, the addresses refer to a different process, but they
>    must still be valid user addresses.
> 
> Is 2 a legacy from when access_ok() actually checked that the
> addresses were mapped into the process's address space?

It never did.  2 is for the situation when a 32bit process
accesses 64bit one; addresses that are perfectly legitimate
for 64bit userland (and fitting into the first 4Gb of address
space, so they can be represented by 32bit pointers just fine)
might be rejected by access_ok() if the caller is 32bit.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 05/11] iov_iter: merge the compat case into rw_copy_check_uvector
  2020-09-21 14:34   ` [PATCH 05/11] iov_iter: merge the compat case into rw_copy_check_uvector Christoph Hellwig
@ 2020-09-21 15:14     ` Al Viro
  2021-01-08 11:49     ` David Laight
  1 sibling, 0 replies; 171+ messages in thread
From: Al Viro @ 2020-09-21 15:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Mon, Sep 21, 2020 at 04:34:28PM +0200, Christoph Hellwig wrote:
> +static int compat_copy_iovecs_from_user(struct iovec *iov,
> +		const struct iovec __user *uvector, unsigned long nr_segs)
> +{
> +	const struct compat_iovec __user *uiov > +		(const struct compat_iovec __user *)uvector;
> +	unsigned long i;
	         ^^^^
Huh?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 06/11] iov_iter: handle the compat case in import_iovec
  2020-09-21 14:34   ` [PATCH 06/11] iov_iter: handle the compat case in import_iovec Christoph Hellwig
@ 2020-09-21 15:20     ` Al Viro
  0 siblings, 0 replies; 171+ messages in thread
From: Al Viro @ 2020-09-21 15:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Mon, Sep 21, 2020 at 04:34:29PM +0200, Christoph Hellwig wrote:
> Use in compat_syscall to import either native or the compat iovecs, and
> remove the now superflous compat_import_iovec, which removes the need for
> special compat logic in most callers.  Only io_uring needs special
> treatment given that it can call import_iovec from kernel threads acting
> on behalf of native or compat syscalls.  Expose the low-level
> __import_iovec helper and use it in io_uring to explicitly pick a iovec
> layout.

fs/aio.c part is not obvious...  Might be better to use __import_iovec()
there as well and leave the rest of aio.c changes to followup.

> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1683,7 +1683,7 @@ static int compat_copy_iovecs_from_user(struct iovec *iov,
>  	return ret;
>  }
>  
> -static ssize_t __import_iovec(int type, const struct iovec __user *uvector,
> +ssize_t __import_iovec(int type, const struct iovec __user *uvector,
>  		unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
>  		struct iov_iter *i, bool compat)
>  {

Don't make it static in the first place, perhaps?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH 02/11] mm: call import_iovec() instead of rw_copy_check_uvector() in process_vm_rw()
  2020-09-21 15:02     ` Al Viro
@ 2020-09-21 15:21       ` David Laight
  2020-09-21 15:29         ` Al Viro
  0 siblings, 1 reply; 171+ messages in thread
From: David Laight @ 2020-09-21 15:21 UTC (permalink / raw)
  To: 'Al Viro', Christoph Hellwig
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

From: Al Viro
> Sent: 21 September 2020 16:02
> 
> On Mon, Sep 21, 2020 at 04:34:25PM +0200, Christoph Hellwig wrote:
> > From: David Laight <David.Laight@ACULAB.COM>
> >
> > This is the only direct call of rw_copy_check_uvector().  Removing it
> > will allow rw_copy_check_uvector() to be inlined into import_iovec(),
> > while only paying a minor price by setting up an otherwise unused
> > iov_iter in the process_vm_readv/process_vm_writev syscalls that aren't
> > in a super hot path.
> 
> > @@ -443,7 +443,7 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
> >  			const struct iovec *iov, unsigned long nr_segs,
> >  			size_t count)
> >  {
> > -	WARN_ON(direction & ~(READ | WRITE));
> > +	WARN_ON(direction & ~(READ | WRITE | CHECK_IOVEC_ONLY));
> >  	direction &= READ | WRITE;
> 
> Ugh...
> 
> > -	rc = rw_copy_check_uvector(CHECK_IOVEC_ONLY, rvec, riovcnt, UIO_FASTIOV,
> > -				   iovstack_r, &iov_r);
> > +	rc = import_iovec(CHECK_IOVEC_ONLY, rvec, riovcnt, UIO_FASTIOV, &iov_r,
> > +			  &iter_r);
> >  	if (rc <= 0)
> >  		goto free_iovecs;
> >
> > -	rc = process_vm_rw_core(pid, &iter, iov_r, riovcnt, flags, vm_write);
> > +	rc = process_vm_rw_core(pid, &iter_l, iter_r.iov, iter_r.nr_segs,
> > +				flags, vm_write);
> 
> ... and ugh^2, since now you are not only setting a meaningless iov_iter,
> you are creating a new place that pokes directly into struct iov_iter
> guts.
> 
> Sure, moving rw_copy_check_uvector() over to lib/iov_iter.c makes sense.
> But I would rather split the access_ok()-related checks out of that thing
> and bury CHECK_IOVEC_ONLY.
> 
> Step 1: move the damn thing to lib/iov_iter.c (same as you do, but without
> making it static)
> 
> Step 2: split it in two:
> 
> ssize_t rw_copy_check_uvector(const struct iovec __user * uvector,
>                               unsigned long nr_segs, unsigned long fast_segs,
>                               struct iovec *fast_pointer,
>                               struct iovec **ret_pointer)
> {
> 	unsigned long seg;
...
> 	ret = 0;
> 	for (seg = 0; seg < nr_segs; seg++) {
> 		void __user *buf = iov[seg].iov_base;
> 		ssize_t len = (ssize_t)iov[seg].iov_len;
> 
> 		/* see if we we're about to use an invalid len or if
> 		 * it's about to overflow ssize_t */
> 		if (len < 0)
> 			return -EINVAL;
> 		if (len > MAX_RW_COUNT - ret) {
> 			len = MAX_RW_COUNT - ret;
> 			iov[seg].iov_len = len;
> 		}
> 		ret += len;
> 	}
> 	return ret;
> }
> 
> /*
>  *  This is merely an early sanity check; we do _not_ rely upon
>  *  it when we get to the actual memory accesses.
>  */
> static bool check_iovecs(const struct iovec *iov, int nr_segs)
> {
>         for (seg = 0; seg < nr_segs; seg++) {
>                 void __user *buf = iov[seg].iov_base;
>                 ssize_t len = (ssize_t)iov[seg].iov_len;
> 
>                 if (unlikely(!access_ok(buf, len)))
>                         return false;
>         }
> 	return true;
> }

You really don't want to be looping through the array twice.
In fact you don't really want to be doing all those tests at all.
This code makes a significant fraction of the not-insignificant
difference between the 'costs' of send() and sendmsg().

I think the 'length' check can be optimised to do something like:
	for (...) {
		ssize_t len = (ssize_t)iov[seg].iov_len;
		ret += len;
		len_hi += (unsigned long)len >> 20;
	}
	if (len_hi) {
		/* Something potentially odd in the lengths.
		 * Might just be a very long fragment.
		 * Check the individial values. */
		Add the exiting loop here.
	}

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH 04/11] iov_iter: explicitly check for CHECK_IOVEC_ONLY in rw_copy_check_uvector
  2020-09-21 15:11       ` Al Viro
@ 2020-09-21 15:26         ` David Laight
  0 siblings, 0 replies; 171+ messages in thread
From: David Laight @ 2020-09-21 15:26 UTC (permalink / raw)
  To: 'Al Viro'
  Cc: 'Christoph Hellwig',
	Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

From: Al Viro
> Sent: 21 September 2020 16:11
> On Mon, Sep 21, 2020 at 03:05:32PM +0000, David Laight wrote:
> 
> > I've actually no idea:
> > 1) Why there is an access_ok() check here.
> >    It will be repeated by the user copy functions.
> 
> Early sanity check.
> 
> > 2) Why it isn't done when called from mm/process_vm_access.c.
> >    Ok, the addresses refer to a different process, but they
> >    must still be valid user addresses.
> >
> > Is 2 a legacy from when access_ok() actually checked that the
> > addresses were mapped into the process's address space?
> 
> It never did.  2 is for the situation when a 32bit process
> accesses 64bit one; addresses that are perfectly legitimate
> for 64bit userland (and fitting into the first 4Gb of address
> space, so they can be represented by 32bit pointers just fine)
> might be rejected by access_ok() if the caller is 32bit.

Can't 32 bit processes on a 64bit system access all the way to 4GB?
Mapping things by default above 3GB will probably break things.
But there is no reason to disallow explicit maps.
And in any case access_ok() can use the same limit as it does for
64bit processes - the page fault handler will sort it all out.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 02/11] mm: call import_iovec() instead of rw_copy_check_uvector() in process_vm_rw()
  2020-09-21 15:21       ` David Laight
@ 2020-09-21 15:29         ` Al Viro
  2020-09-21 15:44           ` David Laight
  2020-09-21 16:12           ` Christoph Hellwig
  0 siblings, 2 replies; 171+ messages in thread
From: Al Viro @ 2020-09-21 15:29 UTC (permalink / raw)
  To: David Laight
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Mon, Sep 21, 2020 at 03:21:35PM +0000, David Laight wrote:

> You really don't want to be looping through the array twice.

Profiles, please.

> I think the 'length' check can be optimised to do something like:
> 	for (...) {
> 		ssize_t len = (ssize_t)iov[seg].iov_len;
> 		ret += len;
> 		len_hi += (unsigned long)len >> 20;
> 	}
> 	if (len_hi) {
> 		/* Something potentially odd in the lengths.
> 		 * Might just be a very long fragment.
> 		 * Check the individial values. */
> 		Add the exiting loop here.
> 	}

Far too ugly to live.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH 02/11] mm: call import_iovec() instead of rw_copy_check_uvector() in process_vm_rw()
  2020-09-21 15:29         ` Al Viro
@ 2020-09-21 15:44           ` David Laight
  2020-09-21 16:27             ` Al Viro
  2020-09-21 16:12           ` Christoph Hellwig
  1 sibling, 1 reply; 171+ messages in thread
From: David Laight @ 2020-09-21 15:44 UTC (permalink / raw)
  To: 'Al Viro'
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

From: Al Viro
> Sent: 21 September 2020 16:30
> 
> On Mon, Sep 21, 2020 at 03:21:35PM +0000, David Laight wrote:
> 
> > You really don't want to be looping through the array twice.
> 
> Profiles, please.

I did some profiling of send() v sendmsg() much earlier in the year.
I can't remember the exact details but the extra cost of sendmsg()
is far more than you might expect.
(I was timing sending fully built IPv4 UDP packets using a raw socket.)

About half the difference does away if you change the
copy_from_user() to __copy_from_user() when reading the struct msghdr
and iov[] from userspace (user copy hardening is expensive).

The rest is just code path, my gut feeling is that a lot of that
is in import_iovec().

Remember semdmsg() is likely to be called with an iov count of 1.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-19 22:22               ` Andy Lutomirski
@ 2020-09-21 16:10                 ` Pavel Begunkov
  2020-09-21 16:13                   ` Pavel Begunkov
  0 siblings, 1 reply; 171+ messages in thread
From: Pavel Begunkov @ 2020-09-21 16:10 UTC (permalink / raw)
  To: Andy Lutomirski, Arnd Bergmann
  Cc: Andy Lutomirski, Christoph Hellwig, Al Viro, Andrew Morton,
	Jens Axboe, David Howells, linux-arm-kernel, X86 ML, LKML,
	open list:MIPS, Parisc List, linuxppc-dev, linux-s390,
	sparclinux, linux-block, Linux SCSI List, Linux FS Devel,
	linux-aio, io-uring, linux-arch, Linux-MM, Network Development,
	keyrings, LSM List

On 20/09/2020 01:22, Andy Lutomirski wrote:
> 
>> On Sep 19, 2020, at 2:16 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>>
>> ´╗┐On Sat, Sep 19, 2020 at 6:21 PM Andy Lutomirski <luto@kernel.org> wrote:
>>>> On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig <hch@lst.de> wrote:
>>>> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
>>>>> Said that, why not provide a variant that would take an explicit
>>>>> "is it compat" argument and use it there?  And have the normal
>>>>> one pass in_compat_syscall() to that...
>>>>
>>>> That would help to not introduce a regression with this series yes.
>>>> But it wouldn't fix existing bugs when io_uring is used to access
>>>> read or write methods that use in_compat_syscall().  One example that
>>>> I recently ran into is drivers/scsi/sg.c.
>>
>> Ah, so reading /dev/input/event* would suffer from the same issue,
>> and that one would in fact be broken by your patch in the hypothetical
>> case that someone tried to use io_uring to read /dev/input/event on x32...
>>
>> For reference, I checked the socket timestamp handling that has a
>> number of corner cases with time32/time64 formats in compat mode,
>> but none of those appear to be affected by the problem.
>>
>>> Aside from the potentially nasty use of per-task variables, one thing
>>> I don't like about PF_FORCE_COMPAT is that it's one-way.  If we're
>>> going to have a generic mechanism for this, shouldn't we allow a full
>>> override of the syscall arch instead of just allowing forcing compat
>>> so that a compat syscall can do a non-compat operation?
>>
>> The only reason it's needed here is that the caller is in a kernel
>> thread rather than a system call. Are there any possible scenarios
>> where one would actually need the opposite?
>>
> 
> I can certainly imagine needing to force x32 mode from a kernel thread.
> 
> As for the other direction: what exactly are the desired bitness/arch semantics of io_uring?  Is the operation bitness chosen by the io_uring creation or by the io_uring_enter() bitness?

It's rather the second one. Even though AFAIR it wasn't discussed
specifically, that how it works now (_partially_).

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 02/11] mm: call import_iovec() instead of rw_copy_check_uvector() in process_vm_rw()
  2020-09-21 15:29         ` Al Viro
  2020-09-21 15:44           ` David Laight
@ 2020-09-21 16:12           ` Christoph Hellwig
  1 sibling, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-21 16:12 UTC (permalink / raw)
  To: Al Viro
  Cc: David Laight, Christoph Hellwig, Andrew Morton, Jens Axboe,
	Arnd Bergmann, David Howells, linux-arm-kernel, linux-kernel,
	linux-mips, linux-parisc, linuxppc-dev, linux-s390, sparclinux,
	linux-block, linux-scsi, linux-fsdevel, linux-aio, io-uring,
	linux-arch, linux-mm, netdev, keyrings, linux-security-module

On Mon, Sep 21, 2020 at 04:29:37PM +0100, Al Viro wrote:
> On Mon, Sep 21, 2020 at 03:21:35PM +0000, David Laight wrote:
> 
> > You really don't want to be looping through the array twice.
> 
> Profiles, please.

Given that the iov array should be cache hot I'd be surprised to
see a huge difference.  

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-21 16:10                 ` Pavel Begunkov
@ 2020-09-21 16:13                   ` Pavel Begunkov
  2020-09-21 23:51                     ` Andy Lutomirski
  0 siblings, 1 reply; 171+ messages in thread
From: Pavel Begunkov @ 2020-09-21 16:13 UTC (permalink / raw)
  To: Andy Lutomirski, Arnd Bergmann
  Cc: Andy Lutomirski, Christoph Hellwig, Al Viro, Andrew Morton,
	Jens Axboe, David Howells, linux-arm-kernel, X86 ML, LKML,
	open list:MIPS, Parisc List, linuxppc-dev, linux-s390,
	sparclinux, linux-block, Linux SCSI List, Linux FS Devel,
	linux-aio, io-uring, linux-arch, Linux-MM, Network Development,
	keyrings, LSM List

On 21/09/2020 19:10, Pavel Begunkov wrote:
> On 20/09/2020 01:22, Andy Lutomirski wrote:
>>
>>> On Sep 19, 2020, at 2:16 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>
>>> ´╗┐On Sat, Sep 19, 2020 at 6:21 PM Andy Lutomirski <luto@kernel.org> wrote:
>>>>> On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig <hch@lst.de> wrote:
>>>>> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
>>>>>> Said that, why not provide a variant that would take an explicit
>>>>>> "is it compat" argument and use it there?  And have the normal
>>>>>> one pass in_compat_syscall() to that...
>>>>>
>>>>> That would help to not introduce a regression with this series yes.
>>>>> But it wouldn't fix existing bugs when io_uring is used to access
>>>>> read or write methods that use in_compat_syscall().  One example that
>>>>> I recently ran into is drivers/scsi/sg.c.
>>>
>>> Ah, so reading /dev/input/event* would suffer from the same issue,
>>> and that one would in fact be broken by your patch in the hypothetical
>>> case that someone tried to use io_uring to read /dev/input/event on x32...
>>>
>>> For reference, I checked the socket timestamp handling that has a
>>> number of corner cases with time32/time64 formats in compat mode,
>>> but none of those appear to be affected by the problem.
>>>
>>>> Aside from the potentially nasty use of per-task variables, one thing
>>>> I don't like about PF_FORCE_COMPAT is that it's one-way.  If we're
>>>> going to have a generic mechanism for this, shouldn't we allow a full
>>>> override of the syscall arch instead of just allowing forcing compat
>>>> so that a compat syscall can do a non-compat operation?
>>>
>>> The only reason it's needed here is that the caller is in a kernel
>>> thread rather than a system call. Are there any possible scenarios
>>> where one would actually need the opposite?
>>>
>>
>> I can certainly imagine needing to force x32 mode from a kernel thread.
>>
>> As for the other direction: what exactly are the desired bitness/arch semantics of io_uring?  Is the operation bitness chosen by the io_uring creation or by the io_uring_enter() bitness?
> 
> It's rather the second one. Even though AFAIR it wasn't discussed
> specifically, that how it works now (_partially_).

Double checked -- I'm wrong, that's the former one. Most of it is based
on a flag that was set an creation.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20 15:55     ` William Kucharski
@ 2020-09-21 16:20       ` Pavel Begunkov
  0 siblings, 0 replies; 171+ messages in thread
From: Pavel Begunkov @ 2020-09-21 16:20 UTC (permalink / raw)
  To: William Kucharski, Matthew Wilcox
  Cc: Christoph Hellwig, Alexander Viro, Andrew Morton, Jens Axboe,
	Arnd Bergmann, David Howells, linux-arm-kernel, x86,
	linux-kernel, linux-mips, linux-parisc, linuxppc-dev, linux-s390,
	sparclinux, linux-block, linux-scsi, linux-fsdevel, linux-aio,
	io-uring, linux-arch, linux-mm, netdev, keyrings,
	linux-security-module

On 20/09/2020 18:55, William Kucharski wrote:
> I really like that as it’s self-documenting and anyone debugging it can see what is actually being used at a glance.

Also creates special cases for things that few people care about,
and makes it a pain for cross-platform (cross-bitness) development.

> 
>> On Sep 20, 2020, at 09:15, Matthew Wilcox <willy@infradead.org> wrote:
>>
>> On Fri, Sep 18, 2020 at 02:45:25PM +0200, Christoph Hellwig wrote:
>>> Add a flag to force processing a syscall as a compat syscall.  This is
>>> required so that in_compat_syscall() works for I/O submitted by io_uring
>>> helper threads on behalf of compat syscalls.
>>
>> Al doesn't like this much, but my suggestion is to introduce two new
>> opcodes -- IORING_OP_READV32 and IORING_OP_WRITEV32.  The compat code
>> can translate IORING_OP_READV to IORING_OP_READV32 and then the core
>> code can know what that user pointer is pointing to.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20 19:22           ` Matthew Wilcox
  2020-09-20 19:28             ` Andy Lutomirski
  2020-09-20 21:42             ` Al Viro
@ 2020-09-21 16:26             ` Pavel Begunkov
  2 siblings, 0 replies; 171+ messages in thread
From: Pavel Begunkov @ 2020-09-21 16:26 UTC (permalink / raw)
  To: Matthew Wilcox, Al Viro
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, x86, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On 20/09/2020 22:22, Matthew Wilcox wrote:
> On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote:
>> IMO it's much saner to mark those and refuse to touch them from io_uring...
> 
> Simpler solution is to remove io_uring from the 32-bit syscall list.
> If you're a 32-bit process, you don't get to use io_uring.  Would
> any real users actually care about that?

There were .net and\or wine (which AFAIK often works in compat) guys
experimenting with io_uring, they might want it.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 02/11] mm: call import_iovec() instead of rw_copy_check_uvector() in process_vm_rw()
  2020-09-21 15:44           ` David Laight
@ 2020-09-21 16:27             ` Al Viro
  0 siblings, 0 replies; 171+ messages in thread
From: Al Viro @ 2020-09-21 16:27 UTC (permalink / raw)
  To: David Laight
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Mon, Sep 21, 2020 at 03:44:00PM +0000, David Laight wrote:
> From: Al Viro
> > Sent: 21 September 2020 16:30
> > 
> > On Mon, Sep 21, 2020 at 03:21:35PM +0000, David Laight wrote:
> > 
> > > You really don't want to be looping through the array twice.
> > 
> > Profiles, please.
> 
> I did some profiling of send() v sendmsg() much earlier in the year.
> I can't remember the exact details but the extra cost of sendmsg()
> is far more than you might expect.
> (I was timing sending fully built IPv4 UDP packets using a raw socket.)
> 
> About half the difference does away if you change the
> copy_from_user() to __copy_from_user() when reading the struct msghdr
> and iov[] from userspace (user copy hardening is expensive).

Wha...?  So the difference is within 4 times the overhead of the
hardening checks done for one call of copy_from_user()?

> The rest is just code path, my gut feeling is that a lot of that
> is in import_iovec().
> 
> Remember semdmsg() is likely to be called with an iov count of 1.

... which makes that loop unlikely to be noticable in the entire
mess, whether you pass it once or twice.  IOW, unless you can show
profiles where that loop is sufficiently hot or if you can show
the timings change from splitting it in two, I'll remain very
sceptical about that assertion.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-20 21:13                 ` David Laight
@ 2020-09-21 16:31                   ` Pavel Begunkov
  0 siblings, 0 replies; 171+ messages in thread
From: Pavel Begunkov @ 2020-09-21 16:31 UTC (permalink / raw)
  To: David Laight, 'Arnd Bergmann', Andy Lutomirski
  Cc: Matthew Wilcox, Al Viro, Christoph Hellwig, Andrew Morton,
	Jens Axboe, David Howells, linux-arm-kernel, X86 ML, LKML,
	open list:MIPS, Parisc List, linuxppc-dev, linux-s390,
	sparclinux, linux-block, Linux SCSI List, Linux FS Devel,
	linux-aio, io-uring, linux-arch, Linux-MM, Network Development,
	keyrings, LSM List

On 21/09/2020 00:13, David Laight wrote:
> From: Arnd Bergmann
>> Sent: 20 September 2020 21:49
>>
>> On Sun, Sep 20, 2020 at 9:28 PM Andy Lutomirski <luto@kernel.org> wrote:
>>> On Sun, Sep 20, 2020 at 12:23 PM Matthew Wilcox <willy@infradead.org> wrote:
>>>>
>>>> On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote:
>>>>> IMO it's much saner to mark those and refuse to touch them from io_uring...
>>>>
>>>> Simpler solution is to remove io_uring from the 32-bit syscall list.
>>>> If you're a 32-bit process, you don't get to use io_uring.  Would
>>>> any real users actually care about that?
>>>
>>> We could go one step farther and declare that we're done adding *any*
>>> new compat syscalls :)
>>
>> Would you also stop adding system calls to native 32-bit systems then?
>>
>> On memory constrained systems (less than 2GB a.t.m.), there is still a
>> strong demand for running 32-bit user space, but all of the recent Arm
>> cores (after Cortex-A55) dropped the ability to run 32-bit kernels, so
>> that compat mode may eventually become the primary way to run
>> Linux on cheap embedded systems.
>>
>> I don't think there is any chance we can realistically take away io_uring
>> from the 32-bit ABI any more now.
> 
> Can't it just run requests from 32bit apps in a kernel thread that has
> the 'in_compat_syscall' flag set?
> Not that i recall seeing the code where it saves the 'compat' nature
> of any requests.
> 
> It is already completely f*cked if you try to pass the command ring
> to a child process - it uses the wrong 'mm'.

And how so? io_uring uses mm of a submitter. The exception is SQPOLL
mode, but it requires CAP_SYS_ADMIN or CAP_SYS_NICE anyway.

> I suspect there are some really horrid security holes in that area.
> 
> 	David.
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-21 16:13                   ` Pavel Begunkov
@ 2020-09-21 23:51                     ` Andy Lutomirski
  2020-09-22  0:22                       ` Pavel Begunkov
  0 siblings, 1 reply; 171+ messages in thread
From: Andy Lutomirski @ 2020-09-21 23:51 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Arnd Bergmann, Andy Lutomirski, Christoph Hellwig, Al Viro,
	Andrew Morton, Jens Axboe, David Howells, linux-arm-kernel,
	X86 ML, LKML, open list:MIPS, Parisc List, linuxppc-dev,
	linux-s390, sparclinux, linux-block, Linux SCSI List,
	Linux FS Devel, linux-aio, io-uring, linux-arch, Linux-MM,
	Network Development, keyrings, LSM List

On Mon, Sep 21, 2020 at 9:15 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 21/09/2020 19:10, Pavel Begunkov wrote:
> > On 20/09/2020 01:22, Andy Lutomirski wrote:
> >>
> >>> On Sep 19, 2020, at 2:16 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> >>>
> >>> ´╗┐On Sat, Sep 19, 2020 at 6:21 PM Andy Lutomirski <luto@kernel.org> wrote:
> >>>>> On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig <hch@lst.de> wrote:
> >>>>> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> >>>>>> Said that, why not provide a variant that would take an explicit
> >>>>>> "is it compat" argument and use it there?  And have the normal
> >>>>>> one pass in_compat_syscall() to that...
> >>>>>
> >>>>> That would help to not introduce a regression with this series yes.
> >>>>> But it wouldn't fix existing bugs when io_uring is used to access
> >>>>> read or write methods that use in_compat_syscall().  One example that
> >>>>> I recently ran into is drivers/scsi/sg.c.
> >>>
> >>> Ah, so reading /dev/input/event* would suffer from the same issue,
> >>> and that one would in fact be broken by your patch in the hypothetical
> >>> case that someone tried to use io_uring to read /dev/input/event on x32...
> >>>
> >>> For reference, I checked the socket timestamp handling that has a
> >>> number of corner cases with time32/time64 formats in compat mode,
> >>> but none of those appear to be affected by the problem.
> >>>
> >>>> Aside from the potentially nasty use of per-task variables, one thing
> >>>> I don't like about PF_FORCE_COMPAT is that it's one-way.  If we're
> >>>> going to have a generic mechanism for this, shouldn't we allow a full
> >>>> override of the syscall arch instead of just allowing forcing compat
> >>>> so that a compat syscall can do a non-compat operation?
> >>>
> >>> The only reason it's needed here is that the caller is in a kernel
> >>> thread rather than a system call. Are there any possible scenarios
> >>> where one would actually need the opposite?
> >>>
> >>
> >> I can certainly imagine needing to force x32 mode from a kernel thread.
> >>
> >> As for the other direction: what exactly are the desired bitness/arch semantics of io_uring?  Is the operation bitness chosen by the io_uring creation or by the io_uring_enter() bitness?
> >
> > It's rather the second one. Even though AFAIR it wasn't discussed
> > specifically, that how it works now (_partially_).
>
> Double checked -- I'm wrong, that's the former one. Most of it is based
> on a flag that was set an creation.
>

Could we get away with making io_uring_enter() return -EINVAL (or
maybe -ENOTTY?) if you try to do it with bitness that doesn't match
the io_uring?  And disable SQPOLL in compat mode?

--Andy

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-21 23:51                     ` Andy Lutomirski
@ 2020-09-22  0:22                       ` Pavel Begunkov
  2020-09-22  0:58                         ` Andy Lutomirski
  0 siblings, 1 reply; 171+ messages in thread
From: Pavel Begunkov @ 2020-09-22  0:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Arnd Bergmann, Christoph Hellwig, Al Viro, Andrew Morton,
	Jens Axboe, David Howells, linux-arm-kernel, X86 ML, LKML,
	open list:MIPS, Parisc List, linuxppc-dev, linux-s390,
	sparclinux, linux-block, Linux SCSI List, Linux FS Devel,
	linux-aio, io-uring, linux-arch, Linux-MM, Network Development,
	keyrings, LSM List



On 22/09/2020 02:51, Andy Lutomirski wrote:
> On Mon, Sep 21, 2020 at 9:15 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> On 21/09/2020 19:10, Pavel Begunkov wrote:
>>> On 20/09/2020 01:22, Andy Lutomirski wrote:
>>>>
>>>>> On Sep 19, 2020, at 2:16 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>>>
>>>>> ´╗┐On Sat, Sep 19, 2020 at 6:21 PM Andy Lutomirski <luto@kernel.org> wrote:
>>>>>>> On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig <hch@lst.de> wrote:
>>>>>>> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
>>>>>>>> Said that, why not provide a variant that would take an explicit
>>>>>>>> "is it compat" argument and use it there?  And have the normal
>>>>>>>> one pass in_compat_syscall() to that...
>>>>>>>
>>>>>>> That would help to not introduce a regression with this series yes.
>>>>>>> But it wouldn't fix existing bugs when io_uring is used to access
>>>>>>> read or write methods that use in_compat_syscall().  One example that
>>>>>>> I recently ran into is drivers/scsi/sg.c.
>>>>>
>>>>> Ah, so reading /dev/input/event* would suffer from the same issue,
>>>>> and that one would in fact be broken by your patch in the hypothetical
>>>>> case that someone tried to use io_uring to read /dev/input/event on x32...
>>>>>
>>>>> For reference, I checked the socket timestamp handling that has a
>>>>> number of corner cases with time32/time64 formats in compat mode,
>>>>> but none of those appear to be affected by the problem.
>>>>>
>>>>>> Aside from the potentially nasty use of per-task variables, one thing
>>>>>> I don't like about PF_FORCE_COMPAT is that it's one-way.  If we're
>>>>>> going to have a generic mechanism for this, shouldn't we allow a full
>>>>>> override of the syscall arch instead of just allowing forcing compat
>>>>>> so that a compat syscall can do a non-compat operation?
>>>>>
>>>>> The only reason it's needed here is that the caller is in a kernel
>>>>> thread rather than a system call. Are there any possible scenarios
>>>>> where one would actually need the opposite?
>>>>>
>>>>
>>>> I can certainly imagine needing to force x32 mode from a kernel thread.
>>>>
>>>> As for the other direction: what exactly are the desired bitness/arch semantics of io_uring?  Is the operation bitness chosen by the io_uring creation or by the io_uring_enter() bitness?
>>>
>>> It's rather the second one. Even though AFAIR it wasn't discussed
>>> specifically, that how it works now (_partially_).
>>
>> Double checked -- I'm wrong, that's the former one. Most of it is based
>> on a flag that was set an creation.
>>
> 
> Could we get away with making io_uring_enter() return -EINVAL (or
> maybe -ENOTTY?) if you try to do it with bitness that doesn't match
> the io_uring?  And disable SQPOLL in compat mode?

Something like below. If PF_FORCE_COMPAT or any other solution
doesn't lend by the time, I'll take a look whether other io_uring's
syscalls need similar checks, etc.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 0458f02d4ca8..aab20785fa9a 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8671,6 +8671,10 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 	if (ctx->flags & IORING_SETUP_R_DISABLED)
 		goto out;
 
+	ret = -EINVAl;
+	if (ctx->compat != in_compat_syscall())
+		goto out;
+
 	/*
 	 * For SQ polling, the thread will do all submissions and completions.
 	 * Just return the requested submit count, and wake the thread if
@@ -9006,6 +9010,10 @@ static int io_uring_create(unsigned entries, struct io_uring_params *p,
 	if (ret)
 		goto err;
 
+	ret = -EINVAL;
+	if (ctx->compat)
+		goto err;
+
 	/* Only gets the ring fd, doesn't install it in the file table */
 	fd = io_uring_get_fd(ctx, &file);
 	if (fd < 0) {
-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-22  0:22                       ` Pavel Begunkov
@ 2020-09-22  0:58                         ` Andy Lutomirski
  2020-09-22  6:30                           ` Pavel Begunkov
  0 siblings, 1 reply; 171+ messages in thread
From: Andy Lutomirski @ 2020-09-22  0:58 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Andy Lutomirski, Arnd Bergmann, Christoph Hellwig, Al Viro,
	Andrew Morton, Jens Axboe, David Howells, linux-arm-kernel,
	X86 ML, LKML, open list:MIPS, Parisc List, linuxppc-dev,
	linux-s390, sparclinux, linux-block, Linux SCSI List,
	Linux FS Devel, linux-aio, io-uring, linux-arch, Linux-MM,
	Network Development, keyrings, LSM List

On Mon, Sep 21, 2020 at 5:24 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
>
>
> On 22/09/2020 02:51, Andy Lutomirski wrote:
> > On Mon, Sep 21, 2020 at 9:15 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> >>
> >> On 21/09/2020 19:10, Pavel Begunkov wrote:
> >>> On 20/09/2020 01:22, Andy Lutomirski wrote:
> >>>>
> >>>>> On Sep 19, 2020, at 2:16 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> >>>>>
> >>>>> ´╗┐On Sat, Sep 19, 2020 at 6:21 PM Andy Lutomirski <luto@kernel.org> wrote:
> >>>>>>> On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig <hch@lst.de> wrote:
> >>>>>>> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> >>>>>>>> Said that, why not provide a variant that would take an explicit
> >>>>>>>> "is it compat" argument and use it there?  And have the normal
> >>>>>>>> one pass in_compat_syscall() to that...
> >>>>>>>
> >>>>>>> That would help to not introduce a regression with this series yes.
> >>>>>>> But it wouldn't fix existing bugs when io_uring is used to access
> >>>>>>> read or write methods that use in_compat_syscall().  One example that
> >>>>>>> I recently ran into is drivers/scsi/sg.c.
> >>>>>
> >>>>> Ah, so reading /dev/input/event* would suffer from the same issue,
> >>>>> and that one would in fact be broken by your patch in the hypothetical
> >>>>> case that someone tried to use io_uring to read /dev/input/event on x32...
> >>>>>
> >>>>> For reference, I checked the socket timestamp handling that has a
> >>>>> number of corner cases with time32/time64 formats in compat mode,
> >>>>> but none of those appear to be affected by the problem.
> >>>>>
> >>>>>> Aside from the potentially nasty use of per-task variables, one thing
> >>>>>> I don't like about PF_FORCE_COMPAT is that it's one-way.  If we're
> >>>>>> going to have a generic mechanism for this, shouldn't we allow a full
> >>>>>> override of the syscall arch instead of just allowing forcing compat
> >>>>>> so that a compat syscall can do a non-compat operation?
> >>>>>
> >>>>> The only reason it's needed here is that the caller is in a kernel
> >>>>> thread rather than a system call. Are there any possible scenarios
> >>>>> where one would actually need the opposite?
> >>>>>
> >>>>
> >>>> I can certainly imagine needing to force x32 mode from a kernel thread.
> >>>>
> >>>> As for the other direction: what exactly are the desired bitness/arch semantics of io_uring?  Is the operation bitness chosen by the io_uring creation or by the io_uring_enter() bitness?
> >>>
> >>> It's rather the second one. Even though AFAIR it wasn't discussed
> >>> specifically, that how it works now (_partially_).
> >>
> >> Double checked -- I'm wrong, that's the former one. Most of it is based
> >> on a flag that was set an creation.
> >>
> >
> > Could we get away with making io_uring_enter() return -EINVAL (or
> > maybe -ENOTTY?) if you try to do it with bitness that doesn't match
> > the io_uring?  And disable SQPOLL in compat mode?
>
> Something like below. If PF_FORCE_COMPAT or any other solution
> doesn't lend by the time, I'll take a look whether other io_uring's
> syscalls need similar checks, etc.
>
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 0458f02d4ca8..aab20785fa9a 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -8671,6 +8671,10 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
>         if (ctx->flags & IORING_SETUP_R_DISABLED)
>                 goto out;
>
> +       ret = -EINVAl;
> +       if (ctx->compat != in_compat_syscall())
> +               goto out;
> +

This seems entirely reasonable to me.  Sharing an io_uring ring
between programs with different ABIs seems a bit nutty.

>         /*
>          * For SQ polling, the thread will do all submissions and completions.
>          * Just return the requested submit count, and wake the thread if
> @@ -9006,6 +9010,10 @@ static int io_uring_create(unsigned entries, struct io_uring_params *p,
>         if (ret)
>                 goto err;
>
> +       ret = -EINVAL;
> +       if (ctx->compat)
> +               goto err;
> +

I may be looking at a different kernel than you, but aren't you
preventing creating an io_uring regardless of whether SQPOLL is
requested?

>         /* Only gets the ring fd, doesn't install it in the file table */
>         fd = io_uring_get_fd(ctx, &file);
>         if (fd < 0) {
> --
> Pavel Begunkov

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-22  0:58                         ` Andy Lutomirski
@ 2020-09-22  6:30                           ` Pavel Begunkov
  2020-09-22  7:23                             ` Arnd Bergmann
  0 siblings, 1 reply; 171+ messages in thread
From: Pavel Begunkov @ 2020-09-22  6:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Arnd Bergmann, Christoph Hellwig, Al Viro, Andrew Morton,
	Jens Axboe, David Howells, linux-arm-kernel, X86 ML, LKML,
	open list:MIPS, Parisc List, linuxppc-dev, linux-s390,
	sparclinux, linux-block, Linux SCSI List, Linux FS Devel,
	linux-aio, io-uring, linux-arch, Linux-MM, Network Development,
	keyrings, LSM List

On 22/09/2020 03:58, Andy Lutomirski wrote:
> On Mon, Sep 21, 2020 at 5:24 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>>>>>> Ah, so reading /dev/input/event* would suffer from the same issue,
>>>>>>> and that one would in fact be broken by your patch in the hypothetical
>>>>>>> case that someone tried to use io_uring to read /dev/input/event on x32...
>>>>>>>
>>>>>>> For reference, I checked the socket timestamp handling that has a
>>>>>>> number of corner cases with time32/time64 formats in compat mode,
>>>>>>> but none of those appear to be affected by the problem.
>>>>>>>
>>>>>>>> Aside from the potentially nasty use of per-task variables, one thing
>>>>>>>> I don't like about PF_FORCE_COMPAT is that it's one-way.  If we're
>>>>>>>> going to have a generic mechanism for this, shouldn't we allow a full
>>>>>>>> override of the syscall arch instead of just allowing forcing compat
>>>>>>>> so that a compat syscall can do a non-compat operation?
>>>>>>>
>>>>>>> The only reason it's needed here is that the caller is in a kernel
>>>>>>> thread rather than a system call. Are there any possible scenarios
>>>>>>> where one would actually need the opposite?
>>>>>>>
>>>>>>
>>>>>> I can certainly imagine needing to force x32 mode from a kernel thread.
>>>>>>
>>>>>> As for the other direction: what exactly are the desired bitness/arch semantics of io_uring?  Is the operation bitness chosen by the io_uring creation or by the io_uring_enter() bitness?
>>>>>
>>>>> It's rather the second one. Even though AFAIR it wasn't discussed
>>>>> specifically, that how it works now (_partially_).
>>>>
>>>> Double checked -- I'm wrong, that's the former one. Most of it is based
>>>> on a flag that was set an creation.
>>>>
>>>
>>> Could we get away with making io_uring_enter() return -EINVAL (or
>>> maybe -ENOTTY?) if you try to do it with bitness that doesn't match
>>> the io_uring?  And disable SQPOLL in compat mode?
>>
>> Something like below. If PF_FORCE_COMPAT or any other solution
>> doesn't lend by the time, I'll take a look whether other io_uring's
>> syscalls need similar checks, etc.
>>
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 0458f02d4ca8..aab20785fa9a 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -8671,6 +8671,10 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
>>         if (ctx->flags & IORING_SETUP_R_DISABLED)
>>                 goto out;
>>
>> +       ret = -EINVAl;
>> +       if (ctx->compat != in_compat_syscall())
>> +               goto out;
>> +
> 
> This seems entirely reasonable to me.  Sharing an io_uring ring
> between programs with different ABIs seems a bit nutty.
> 
>>         /*
>>          * For SQ polling, the thread will do all submissions and completions.
>>          * Just return the requested submit count, and wake the thread if
>> @@ -9006,6 +9010,10 @@ static int io_uring_create(unsigned entries, struct io_uring_params *p,
>>         if (ret)
>>                 goto err;
>>
>> +       ret = -EINVAL;
>> +       if (ctx->compat)
>> +               goto err;
>> +
> 
> I may be looking at a different kernel than you, but aren't you
> preventing creating an io_uring regardless of whether SQPOLL is
> requested?

I diffed a not-saved file on a sleepy head, thanks for noticing.
As you said, there should be an SQPOLL check.

...
if (ctx->compat && (p->flags & IORING_SETUP_SQPOLL))
	goto err;

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-22  6:30                           ` Pavel Begunkov
@ 2020-09-22  7:23                             ` Arnd Bergmann
  2020-09-22  7:57                               ` Pavel Begunkov
  0 siblings, 1 reply; 171+ messages in thread
From: Arnd Bergmann @ 2020-09-22  7:23 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Andy Lutomirski, Christoph Hellwig, Al Viro, Andrew Morton,
	Jens Axboe, David Howells, linux-arm-kernel, X86 ML, LKML,
	open list:MIPS, Parisc List, linuxppc-dev, linux-s390,
	sparclinux, linux-block, Linux SCSI List, Linux FS Devel,
	linux-aio, io-uring, linux-arch, Linux-MM, Network Development,
	keyrings, LSM List

On Tue, Sep 22, 2020 at 8:32 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> On 22/09/2020 03:58, Andy Lutomirski wrote:
> > On Mon, Sep 21, 2020 at 5:24 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
> > I may be looking at a different kernel than you, but aren't you
> > preventing creating an io_uring regardless of whether SQPOLL is
> > requested?
>
> I diffed a not-saved file on a sleepy head, thanks for noticing.
> As you said, there should be an SQPOLL check.
>
> ...
> if (ctx->compat && (p->flags & IORING_SETUP_SQPOLL))
>         goto err;

Wouldn't that mean that now 32-bit containers behave differently
between compat and native execution?

I think if you want to prevent 32-bit applications from using SQPOLL,
it needs to be done the same way on both to be consistent:

   if ((!IS_ENABLED(CONFIG_64BIT) || ctx->compat) &&
        (p->flags & IORING_SETUP_SQPOLL))
            goto err;

I don't really see how taking away SQPOLL from 32-bit tasks is
any better than just preventing access to the known-broken files
as Al suggested, or adding the hack to make it work as in
Christoph's original patch.

Can we expect all existing and future user space to have a sane
fallback when IORING_SETUP_SQPOLL fails?

      Arnd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-22  7:23                             ` Arnd Bergmann
@ 2020-09-22  7:57                               ` Pavel Begunkov
  2020-09-22  9:01                                 ` Arnd Bergmann
  0 siblings, 1 reply; 171+ messages in thread
From: Pavel Begunkov @ 2020-09-22  7:57 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Andy Lutomirski, Christoph Hellwig, Al Viro, Andrew Morton,
	Jens Axboe, David Howells, linux-arm-kernel, X86 ML, LKML,
	open list:MIPS, Parisc List, linuxppc-dev, linux-s390,
	sparclinux, linux-block, Linux SCSI List, Linux FS Devel,
	linux-aio, io-uring, linux-arch, Linux-MM, Network Development,
	keyrings, LSM List

On 22/09/2020 10:23, Arnd Bergmann wrote:
> On Tue, Sep 22, 2020 at 8:32 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>> On 22/09/2020 03:58, Andy Lutomirski wrote:
>>> On Mon, Sep 21, 2020 at 5:24 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>> I may be looking at a different kernel than you, but aren't you
>>> preventing creating an io_uring regardless of whether SQPOLL is
>>> requested?
>>
>> I diffed a not-saved file on a sleepy head, thanks for noticing.
>> As you said, there should be an SQPOLL check.
>>
>> ...
>> if (ctx->compat && (p->flags & IORING_SETUP_SQPOLL))
>>         goto err;
> 
> Wouldn't that mean that now 32-bit containers behave differently
> between compat and native execution?
> 
> I think if you want to prevent 32-bit applications from using SQPOLL,
> it needs to be done the same way on both to be consistent:

The intention was to disable only compat not native 32-bit.

> 
>    if ((!IS_ENABLED(CONFIG_64BIT) || ctx->compat) &&
>         (p->flags & IORING_SETUP_SQPOLL))
>             goto err;
> 
> I don't really see how taking away SQPOLL from 32-bit tasks is
> any better than just preventing access to the known-broken files
> as Al suggested, or adding the hack to make it work as in
> Christoph's original patch.

That's why I'm hoping that Christoph's work and the discussion will
reach consensus, but the bug should be patched in the end. IMHO,
it's a good and easy enough fallback option (temporal?).

> 
> Can we expect all existing and future user space to have a sane
> fallback when IORING_SETUP_SQPOLL fails?

SQPOLL has a few differences with non-SQPOLL modes, but it's easy
to convert between them. Anyway, SQPOLL is a privileged special
case that's here for performance/latency reasons, I don't think
there will be any non-accidental users of it.
-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-22  7:57                               ` Pavel Begunkov
@ 2020-09-22  9:01                                 ` Arnd Bergmann
  2020-09-22 16:20                                   ` Andy Lutomirski
  2020-09-23  8:01                                   ` Pavel Begunkov
  0 siblings, 2 replies; 171+ messages in thread
From: Arnd Bergmann @ 2020-09-22  9:01 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Andy Lutomirski, Christoph Hellwig, Al Viro, Andrew Morton,
	Jens Axboe, David Howells, linux-arm-kernel, X86 ML, LKML,
	open list:MIPS, Parisc List, linuxppc-dev, linux-s390,
	sparclinux, linux-block, Linux SCSI List, Linux FS Devel,
	linux-aio, io-uring, linux-arch, Linux-MM, Network Development,
	keyrings, LSM List

On Tue, Sep 22, 2020 at 9:59 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> On 22/09/2020 10:23, Arnd Bergmann wrote:
> > On Tue, Sep 22, 2020 at 8:32 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> >> On 22/09/2020 03:58, Andy Lutomirski wrote:
> >>> On Mon, Sep 21, 2020 at 5:24 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
> >>> I may be looking at a different kernel than you, but aren't you
> >>> preventing creating an io_uring regardless of whether SQPOLL is
> >>> requested?
> >>
> >> I diffed a not-saved file on a sleepy head, thanks for noticing.
> >> As you said, there should be an SQPOLL check.
> >>
> >> ...
> >> if (ctx->compat && (p->flags & IORING_SETUP_SQPOLL))
> >>         goto err;
> >
> > Wouldn't that mean that now 32-bit containers behave differently
> > between compat and native execution?
> >
> > I think if you want to prevent 32-bit applications from using SQPOLL,
> > it needs to be done the same way on both to be consistent:
>
> The intention was to disable only compat not native 32-bit.

I'm not following why that would be considered a valid option,
as that clearly breaks existing users that update from a 32-bit
kernel to a 64-bit one.

Taking away the features from users that are still on 32-bit kernels
already seems questionable to me, but being inconsistent
about it seems much worse, in particular when the regression
is on the upgrade path.

> > Can we expect all existing and future user space to have a sane
> > fallback when IORING_SETUP_SQPOLL fails?
>
> SQPOLL has a few differences with non-SQPOLL modes, but it's easy
> to convert between them. Anyway, SQPOLL is a privileged special
> case that's here for performance/latency reasons, I don't think
> there will be any non-accidental users of it.

Ok, so the behavior of 32-bit tasks would be the same as running
the same application as unprivileged 64-bit tasks, with applications
already having to implement that fallback, right?

      Arnd

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-22  9:01                                 ` Arnd Bergmann
@ 2020-09-22 16:20                                   ` Andy Lutomirski
  2020-09-23  8:01                                   ` Pavel Begunkov
  1 sibling, 0 replies; 171+ messages in thread
From: Andy Lutomirski @ 2020-09-22 16:20 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Pavel Begunkov, Andy Lutomirski, Christoph Hellwig, Al Viro,
	Andrew Morton, Jens Axboe, David Howells, linux-arm-kernel,
	X86 ML, LKML, open list:MIPS, Parisc List, linuxppc-dev,
	linux-s390, sparclinux, linux-block, Linux SCSI List,
	Linux FS Devel, linux-aio, io-uring, linux-arch, Linux-MM,
	Network Development, keyrings, LSM List



> On Sep 22, 2020, at 2:01 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> 
> On Tue, Sep 22, 2020 at 9:59 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>> On 22/09/2020 10:23, Arnd Bergmann wrote:
>>> On Tue, Sep 22, 2020 at 8:32 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>>> On 22/09/2020 03:58, Andy Lutomirski wrote:
>>>>> On Mon, Sep 21, 2020 at 5:24 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>>>> I may be looking at a different kernel than you, but aren't you
>>>>> preventing creating an io_uring regardless of whether SQPOLL is
>>>>> requested?
>>>> 
>>>> I diffed a not-saved file on a sleepy head, thanks for noticing.
>>>> As you said, there should be an SQPOLL check.
>>>> 
>>>> ...
>>>> if (ctx->compat && (p->flags & IORING_SETUP_SQPOLL))
>>>>        goto err;
>>> 
>>> Wouldn't that mean that now 32-bit containers behave differently
>>> between compat and native execution?
>>> 
>>> I think if you want to prevent 32-bit applications from using SQPOLL,
>>> it needs to be done the same way on both to be consistent:
>> 
>> The intention was to disable only compat not native 32-bit.
> 
> I'm not following why that would be considered a valid option,
> as that clearly breaks existing users that update from a 32-bit
> kernel to a 64-bit one.
> 
> Taking away the features from users that are still on 32-bit kernels
> already seems questionable to me, but being inconsistent
> about it seems much worse, in particular when the regression
> is on the upgrade path.
> 
>>> Can we expect all existing and future user space to have a sane
>>> fallback when IORING_SETUP_SQPOLL fails?
>> 
>> SQPOLL has a few differences with non-SQPOLL modes, but it's easy
>> to convert between them. Anyway, SQPOLL is a privileged special
>> case that's here for performance/latency reasons, I don't think
>> there will be any non-accidental users of it.
> 
> Ok, so the behavior of 32-bit tasks would be the same as running
> the same application as unprivileged 64-bit tasks, with applications
> already having to implement that fallback, right?
> 
> 

I don’t have any real preference wrt SQPOLL, and it may be that we have a problem even without SQPOLL when IO gets punted without one of the fixes discussed.

But banning the mismatched io_uring and io_uring_enter seems like it may be worthwhile regardless.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 1/9] compat.h: fix a spelling error in <linux/compat.h>
  2020-09-23  6:05 let import_iovec deal with compat_iovecs as well v3 Christoph Hellwig
@ 2020-09-23  6:05 ` Christoph Hellwig
  2020-09-23  6:05 ` [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c Christoph Hellwig
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-23  6:05 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

There is no compat_sys_readv64v2 syscall, only a compat_sys_preadv64v2
one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/compat.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index b354ce58966e2d..654c1ec36671a4 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -812,7 +812,7 @@ asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
 		const struct compat_iovec __user *vec,
 		compat_ulong_t vlen, u32 pos_low, u32 pos_high, rwf_t flags);
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
-asmlinkage long  compat_sys_readv64v2(unsigned long fd,
+asmlinkage long  compat_sys_preadv64v2(unsigned long fd,
 		const struct compat_iovec __user *vec,
 		unsigned long vlen, loff_t pos, rwf_t flags);
 #endif
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c
  2020-09-23  6:05 let import_iovec deal with compat_iovecs as well v3 Christoph Hellwig
  2020-09-23  6:05 ` [PATCH 1/9] compat.h: fix a spelling error in <linux/compat.h> Christoph Hellwig
@ 2020-09-23  6:05 ` Christoph Hellwig
  2020-09-23  6:05 ` [PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec Christoph Hellwig
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-23  6:05 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module, David Laight,
	David Laight

From: David Laight <David.Laight@ACULAB.COM>

This lets the compiler inline it into import_iovec() generating
much better code.

Signed-off-by: David Laight <david.laight@aculab.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 179 ------------------------------------------------
 lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 176 insertions(+), 179 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 5db58b8c78d0dd..e5e891a88442ef 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -752,185 +752,6 @@ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
 	return ret;
 }
 
-/**
- * rw_copy_check_uvector() - Copy an array of &struct iovec from userspace
- *     into the kernel and check that it is valid.
- *
- * @type: One of %CHECK_IOVEC_ONLY, %READ, or %WRITE.
- * @uvector: Pointer to the userspace array.
- * @nr_segs: Number of elements in userspace array.
- * @fast_segs: Number of elements in @fast_pointer.
- * @fast_pointer: Pointer to (usually small on-stack) kernel array.
- * @ret_pointer: (output parameter) Pointer to a variable that will point to
- *     either @fast_pointer, a newly allocated kernel array, or NULL,
- *     depending on which array was used.
- *
- * This function copies an array of &struct iovec of @nr_segs from
- * userspace into the kernel and checks that each element is valid (e.g.
- * it does not point to a kernel address or cause overflow by being too
- * large, etc.).
- *
- * As an optimization, the caller may provide a pointer to a small
- * on-stack array in @fast_pointer, typically %UIO_FASTIOV elements long
- * (the size of this array, or 0 if unused, should be given in @fast_segs).
- *
- * @ret_pointer will always point to the array that was used, so the
- * caller must take care not to call kfree() on it e.g. in case the
- * @fast_pointer array was used and it was allocated on the stack.
- *
- * Return: The total number of bytes covered by the iovec array on success
- *   or a negative error code on error.
- */
-ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
-			      unsigned long nr_segs, unsigned long fast_segs,
-			      struct iovec *fast_pointer,
-			      struct iovec **ret_pointer)
-{
-	unsigned long seg;
-	ssize_t ret;
-	struct iovec *iov = fast_pointer;
-
-	/*
-	 * SuS says "The readv() function *may* fail if the iovcnt argument
-	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-	 * traditionally returned zero for zero segments, so...
-	 */
-	if (nr_segs = 0) {
-		ret = 0;
-		goto out;
-	}
-
-	/*
-	 * First get the "struct iovec" from user memory and
-	 * verify all the pointers
-	 */
-	if (nr_segs > UIO_MAXIOV) {
-		ret = -EINVAL;
-		goto out;
-	}
-	if (nr_segs > fast_segs) {
-		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-		if (iov = NULL) {
-			ret = -ENOMEM;
-			goto out;
-		}
-	}
-	if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
-		ret = -EFAULT;
-		goto out;
-	}
-
-	/*
-	 * According to the Single Unix Specification we should return EINVAL
-	 * if an element length is < 0 when cast to ssize_t or if the
-	 * total length would overflow the ssize_t return value of the
-	 * system call.
-	 *
-	 * Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
-	 * overflow case.
-	 */
-	ret = 0;
-	for (seg = 0; seg < nr_segs; seg++) {
-		void __user *buf = iov[seg].iov_base;
-		ssize_t len = (ssize_t)iov[seg].iov_len;
-
-		/* see if we we're about to use an invalid len or if
-		 * it's about to overflow ssize_t */
-		if (len < 0) {
-			ret = -EINVAL;
-			goto out;
-		}
-		if (type >= 0
-		    && unlikely(!access_ok(buf, len))) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len > MAX_RW_COUNT - ret) {
-			len = MAX_RW_COUNT - ret;
-			iov[seg].iov_len = len;
-		}
-		ret += len;
-	}
-out:
-	*ret_pointer = iov;
-	return ret;
-}
-
-#ifdef CONFIG_COMPAT
-ssize_t compat_rw_copy_check_uvector(int type,
-		const struct compat_iovec __user *uvector, unsigned long nr_segs,
-		unsigned long fast_segs, struct iovec *fast_pointer,
-		struct iovec **ret_pointer)
-{
-	compat_ssize_t tot_len;
-	struct iovec *iov = *ret_pointer = fast_pointer;
-	ssize_t ret = 0;
-	int seg;
-
-	/*
-	 * SuS says "The readv() function *may* fail if the iovcnt argument
-	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-	 * traditionally returned zero for zero segments, so...
-	 */
-	if (nr_segs = 0)
-		goto out;
-
-	ret = -EINVAL;
-	if (nr_segs > UIO_MAXIOV)
-		goto out;
-	if (nr_segs > fast_segs) {
-		ret = -ENOMEM;
-		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-		if (iov = NULL)
-			goto out;
-	}
-	*ret_pointer = iov;
-
-	ret = -EFAULT;
-	if (!access_ok(uvector, nr_segs*sizeof(*uvector)))
-		goto out;
-
-	/*
-	 * Single unix specification:
-	 * We should -EINVAL if an element length is not >= 0 and fitting an
-	 * ssize_t.
-	 *
-	 * In Linux, the total length is limited to MAX_RW_COUNT, there is
-	 * no overflow possibility.
-	 */
-	tot_len = 0;
-	ret = -EINVAL;
-	for (seg = 0; seg < nr_segs; seg++) {
-		compat_uptr_t buf;
-		compat_ssize_t len;
-
-		if (__get_user(len, &uvector->iov_len) ||
-		   __get_user(buf, &uvector->iov_base)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len < 0)	/* size_t not fitting in compat_ssize_t .. */
-			goto out;
-		if (type >= 0 &&
-		    !access_ok(compat_ptr(buf), len)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len > MAX_RW_COUNT - tot_len)
-			len = MAX_RW_COUNT - tot_len;
-		tot_len += len;
-		iov->iov_base = compat_ptr(buf);
-		iov->iov_len = (compat_size_t) len;
-		uvector++;
-		iov++;
-	}
-	ret = tot_len;
-
-out:
-	return ret;
-}
-#endif
-
 static ssize_t do_iter_read(struct file *file, struct iov_iter *iter,
 		loff_t *pos, rwf_t flags)
 {
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 5e40786c8f1232..ccea9db3f72be8 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1650,6 +1650,109 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
 }
 EXPORT_SYMBOL(dup_iter);
 
+/**
+ * rw_copy_check_uvector() - Copy an array of &struct iovec from userspace
+ *     into the kernel and check that it is valid.
+ *
+ * @type: One of %CHECK_IOVEC_ONLY, %READ, or %WRITE.
+ * @uvector: Pointer to the userspace array.
+ * @nr_segs: Number of elements in userspace array.
+ * @fast_segs: Number of elements in @fast_pointer.
+ * @fast_pointer: Pointer to (usually small on-stack) kernel array.
+ * @ret_pointer: (output parameter) Pointer to a variable that will point to
+ *     either @fast_pointer, a newly allocated kernel array, or NULL,
+ *     depending on which array was used.
+ *
+ * This function copies an array of &struct iovec of @nr_segs from
+ * userspace into the kernel and checks that each element is valid (e.g.
+ * it does not point to a kernel address or cause overflow by being too
+ * large, etc.).
+ *
+ * As an optimization, the caller may provide a pointer to a small
+ * on-stack array in @fast_pointer, typically %UIO_FASTIOV elements long
+ * (the size of this array, or 0 if unused, should be given in @fast_segs).
+ *
+ * @ret_pointer will always point to the array that was used, so the
+ * caller must take care not to call kfree() on it e.g. in case the
+ * @fast_pointer array was used and it was allocated on the stack.
+ *
+ * Return: The total number of bytes covered by the iovec array on success
+ *   or a negative error code on error.
+ */
+ssize_t rw_copy_check_uvector(int type, const struct iovec __user *uvector,
+		unsigned long nr_segs, unsigned long fast_segs,
+		struct iovec *fast_pointer, struct iovec **ret_pointer)
+{
+	unsigned long seg;
+	ssize_t ret;
+	struct iovec *iov = fast_pointer;
+
+	/*
+	 * SuS says "The readv() function *may* fail if the iovcnt argument
+	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
+	 * traditionally returned zero for zero segments, so...
+	 */
+	if (nr_segs = 0) {
+		ret = 0;
+		goto out;
+	}
+
+	/*
+	 * First get the "struct iovec" from user memory and
+	 * verify all the pointers
+	 */
+	if (nr_segs > UIO_MAXIOV) {
+		ret = -EINVAL;
+		goto out;
+	}
+	if (nr_segs > fast_segs) {
+		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
+		if (iov = NULL) {
+			ret = -ENOMEM;
+			goto out;
+		}
+	}
+	if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	/*
+	 * According to the Single Unix Specification we should return EINVAL
+	 * if an element length is < 0 when cast to ssize_t or if the
+	 * total length would overflow the ssize_t return value of the
+	 * system call.
+	 *
+	 * Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
+	 * overflow case.
+	 */
+	ret = 0;
+	for (seg = 0; seg < nr_segs; seg++) {
+		void __user *buf = iov[seg].iov_base;
+		ssize_t len = (ssize_t)iov[seg].iov_len;
+
+		/* see if we we're about to use an invalid len or if
+		 * it's about to overflow ssize_t */
+		if (len < 0) {
+			ret = -EINVAL;
+			goto out;
+		}
+		if (type >= 0
+		    && unlikely(!access_ok(buf, len))) {
+			ret = -EFAULT;
+			goto out;
+		}
+		if (len > MAX_RW_COUNT - ret) {
+			len = MAX_RW_COUNT - ret;
+			iov[seg].iov_len = len;
+		}
+		ret += len;
+	}
+out:
+	*ret_pointer = iov;
+	return ret;
+}
+
 /**
  * import_iovec() - Copy an array of &struct iovec from userspace
  *     into the kernel, check that it is valid, and initialize a new
@@ -1695,6 +1798,79 @@ EXPORT_SYMBOL(import_iovec);
 #ifdef CONFIG_COMPAT
 #include <linux/compat.h>
 
+ssize_t compat_rw_copy_check_uvector(int type,
+		const struct compat_iovec __user *uvector,
+		unsigned long nr_segs, unsigned long fast_segs,
+		struct iovec *fast_pointer, struct iovec **ret_pointer)
+{
+	compat_ssize_t tot_len;
+	struct iovec *iov = *ret_pointer = fast_pointer;
+	ssize_t ret = 0;
+	int seg;
+
+	/*
+	 * SuS says "The readv() function *may* fail if the iovcnt argument
+	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
+	 * traditionally returned zero for zero segments, so...
+	 */
+	if (nr_segs = 0)
+		goto out;
+
+	ret = -EINVAL;
+	if (nr_segs > UIO_MAXIOV)
+		goto out;
+	if (nr_segs > fast_segs) {
+		ret = -ENOMEM;
+		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
+		if (iov = NULL)
+			goto out;
+	}
+	*ret_pointer = iov;
+
+	ret = -EFAULT;
+	if (!access_ok(uvector, nr_segs*sizeof(*uvector)))
+		goto out;
+
+	/*
+	 * Single unix specification:
+	 * We should -EINVAL if an element length is not >= 0 and fitting an
+	 * ssize_t.
+	 *
+	 * In Linux, the total length is limited to MAX_RW_COUNT, there is
+	 * no overflow possibility.
+	 */
+	tot_len = 0;
+	ret = -EINVAL;
+	for (seg = 0; seg < nr_segs; seg++) {
+		compat_uptr_t buf;
+		compat_ssize_t len;
+
+		if (__get_user(len, &uvector->iov_len) ||
+		   __get_user(buf, &uvector->iov_base)) {
+			ret = -EFAULT;
+			goto out;
+		}
+		if (len < 0)	/* size_t not fitting in compat_ssize_t .. */
+			goto out;
+		if (type >= 0 &&
+		    !access_ok(compat_ptr(buf), len)) {
+			ret = -EFAULT;
+			goto out;
+		}
+		if (len > MAX_RW_COUNT - tot_len)
+			len = MAX_RW_COUNT - tot_len;
+		tot_len += len;
+		iov->iov_base = compat_ptr(buf);
+		iov->iov_len = (compat_size_t) len;
+		uvector++;
+		iov++;
+	}
+	ret = tot_len;
+
+out:
+	return ret;
+}
+
 ssize_t compat_import_iovec(int type,
 		const struct compat_iovec __user * uvector,
 		unsigned nr_segs, unsigned fast_segs,
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec
  2020-09-23  6:05 let import_iovec deal with compat_iovecs as well v3 Christoph Hellwig
  2020-09-23  6:05 ` [PATCH 1/9] compat.h: fix a spelling error in <linux/compat.h> Christoph Hellwig
  2020-09-23  6:05 ` [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c Christoph Hellwig
@ 2020-09-23  6:05 ` Christoph Hellwig
  2020-09-23 14:16   ` Al Viro
  2020-09-23  6:05 ` [PATCH 4/9] iov_iter: transparently handle compat iovecs in import_iovec Christoph Hellwig
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-23  6:05 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Split rw_copy_check_uvector into two new helpers with more sensible
calling conventions:

 - iovec_from_user copies a iovec from userspace either into the provided
   stack buffer if it fits, or allocates a new buffer for it.  Returns
   the actually used iovec.  It also verifies that iov_len does fit a
   signed type, and handles compat iovecs if the compat flag is set.
 - __import_iovec consolidates the native and compat versions of
   import_iovec. It calls iovec_from_user, then validates each iovec
   actually points to user addresses, and ensures the total length
   doesn't overflow.

This has two major implications:

 - the access_process_vm case loses the total lenght checking, which
   wasn't required anyway, given that each call receives two iovecs
   for the local and remote side of the operation, and it verifies
   the total length on the local side already.
 - instead of a single loop there now are two loops over the iovecs.
   Given that the iovecs are cache hot this doesn't make a major
   difference

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/compat.h |   6 -
 include/linux/fs.h     |  13 --
 include/linux/uio.h    |  12 +-
 lib/iov_iter.c         | 300 ++++++++++++++++-------------------------
 mm/process_vm_access.c |  34 +++--
 5 files changed, 138 insertions(+), 227 deletions(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index 654c1ec36671a4..b930de791ff16b 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -451,12 +451,6 @@ extern long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
 
 struct epoll_event;	/* fortunately, this one is fixed-layout */
 
-extern ssize_t compat_rw_copy_check_uvector(int type,
-		const struct compat_iovec __user *uvector,
-		unsigned long nr_segs,
-		unsigned long fast_segs, struct iovec *fast_pointer,
-		struct iovec **ret_pointer);
-
 extern void __user *compat_alloc_user_space(unsigned long len);
 
 int compat_restore_altstack(const compat_stack_t __user *uss);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7519ae003a082c..e69b45b6cc7b5f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -178,14 +178,6 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 /* File supports async buffered reads */
 #define FMODE_BUF_RASYNC	((__force fmode_t)0x40000000)
 
-/*
- * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
- * that indicates that they should check the contents of the iovec are
- * valid, but not check the memory that the iovec elements
- * points too.
- */
-#define CHECK_IOVEC_ONLY -1
-
 /*
  * Attribute flags.  These should be or-ed together to figure out what
  * has been changed!
@@ -1887,11 +1879,6 @@ static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
 	return file->f_op->mmap(file, vma);
 }
 
-ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
-			      unsigned long nr_segs, unsigned long fast_segs,
-			      struct iovec *fast_pointer,
-			      struct iovec **ret_pointer);
-
 extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
 extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 3835a8a8e9eae0..92c11fe41c6228 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -266,9 +266,15 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, struct
 size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
 		struct iov_iter *i);
 
-ssize_t import_iovec(int type, const struct iovec __user * uvector,
-		 unsigned nr_segs, unsigned fast_segs,
-		 struct iovec **iov, struct iov_iter *i);
+struct iovec *iovec_from_user(const struct iovec __user *uvector,
+		unsigned long nr_segs, unsigned long fast_segs,
+		struct iovec *fast_iov, bool compat);
+ssize_t import_iovec(int type, const struct iovec __user *uvec,
+		 unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
+		 struct iov_iter *i);
+ssize_t __import_iovec(int type, const struct iovec __user *uvec,
+		 unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
+		 struct iov_iter *i, bool compat);
 
 #ifdef CONFIG_COMPAT
 struct compat_iovec;
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index ccea9db3f72be8..d5d8afe31fca16 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -7,6 +7,7 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/splice.h>
+#include <linux/compat.h>
 #include <net/checksum.h>
 #include <linux/scatterlist.h>
 #include <linux/instrumented.h>
@@ -1650,107 +1651,133 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
 }
 EXPORT_SYMBOL(dup_iter);
 
-/**
- * rw_copy_check_uvector() - Copy an array of &struct iovec from userspace
- *     into the kernel and check that it is valid.
- *
- * @type: One of %CHECK_IOVEC_ONLY, %READ, or %WRITE.
- * @uvector: Pointer to the userspace array.
- * @nr_segs: Number of elements in userspace array.
- * @fast_segs: Number of elements in @fast_pointer.
- * @fast_pointer: Pointer to (usually small on-stack) kernel array.
- * @ret_pointer: (output parameter) Pointer to a variable that will point to
- *     either @fast_pointer, a newly allocated kernel array, or NULL,
- *     depending on which array was used.
- *
- * This function copies an array of &struct iovec of @nr_segs from
- * userspace into the kernel and checks that each element is valid (e.g.
- * it does not point to a kernel address or cause overflow by being too
- * large, etc.).
- *
- * As an optimization, the caller may provide a pointer to a small
- * on-stack array in @fast_pointer, typically %UIO_FASTIOV elements long
- * (the size of this array, or 0 if unused, should be given in @fast_segs).
- *
- * @ret_pointer will always point to the array that was used, so the
- * caller must take care not to call kfree() on it e.g. in case the
- * @fast_pointer array was used and it was allocated on the stack.
- *
- * Return: The total number of bytes covered by the iovec array on success
- *   or a negative error code on error.
- */
-ssize_t rw_copy_check_uvector(int type, const struct iovec __user *uvector,
-		unsigned long nr_segs, unsigned long fast_segs,
-		struct iovec *fast_pointer, struct iovec **ret_pointer)
+static int copy_compat_iovec_from_user(struct iovec *iov,
+		const struct iovec __user *uvec, unsigned long nr_segs)
+{
+	const struct compat_iovec __user *uiov +		(const struct compat_iovec __user *)uvec;
+	int ret = -EFAULT, i;
+
+	if (!user_access_begin(uvec, nr_segs * sizeof(*uvec)))
+		return -EFAULT;
+
+	for (i = 0; i < nr_segs; i++) {
+		compat_uptr_t buf;
+		compat_ssize_t len;
+
+		unsafe_get_user(len, &uiov[i].iov_len, uaccess_end);
+		unsafe_get_user(buf, &uiov[i].iov_base, uaccess_end);
+
+		/* check for compat_size_t not fitting in compat_ssize_t .. */
+		if (len < 0) {
+			ret = -EINVAL;
+			goto uaccess_end;
+		}
+		iov[i].iov_base = compat_ptr(buf);
+		iov[i].iov_len = len;
+	}
+
+	ret = 0;
+uaccess_end:
+	user_access_end();
+	return ret;
+}
+		
+static int copy_iovec_from_user(struct iovec *iov,
+		const struct iovec __user *uvec, unsigned long nr_segs)
 {
 	unsigned long seg;
-	ssize_t ret;
-	struct iovec *iov = fast_pointer;
 
-	/*
-	 * SuS says "The readv() function *may* fail if the iovcnt argument
-	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-	 * traditionally returned zero for zero segments, so...
-	 */
-	if (nr_segs = 0) {
-		ret = 0;
-		goto out;
+	if (copy_from_user(iov, uvec, nr_segs * sizeof(*uvec)))
+		return -EFAULT;
+	for (seg = 0; seg < nr_segs; seg++) {
+		if ((ssize_t)iov[seg].iov_len < 0)
+			return -EINVAL;
 	}
 
+	return 0;
+}
+
+struct iovec *iovec_from_user(const struct iovec __user *uvec,
+		unsigned long nr_segs, unsigned long fast_segs,
+		struct iovec *fast_iov, bool compat)
+{
+	struct iovec *iov = fast_iov;
+	int ret;
+
 	/*
-	 * First get the "struct iovec" from user memory and
-	 * verify all the pointers
+	 * SuS says "The readv() function *may* fail if the iovcnt argument was
+	 * less than or equal to 0, or greater than {IOV_MAX}.  Linux has
+	 * traditionally returned zero for zero segments, so...
 	 */
-	if (nr_segs > UIO_MAXIOV) {
-		ret = -EINVAL;
-		goto out;
-	}
+	if (nr_segs = 0)
+		return iov;
+	if (nr_segs > UIO_MAXIOV)
+		return ERR_PTR(-EINVAL);
 	if (nr_segs > fast_segs) {
 		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-		if (iov = NULL) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		if (!iov)
+			return ERR_PTR(-ENOMEM);
 	}
-	if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
-		ret = -EFAULT;
-		goto out;
+
+	if (compat)
+		ret = copy_compat_iovec_from_user(iov, uvec, nr_segs);
+	else
+		ret = copy_iovec_from_user(iov, uvec, nr_segs);
+	if (ret) {
+		if (iov != fast_iov)
+			kfree(iov);
+		return ERR_PTR(ret);
+	}
+
+	return iov;
+}
+
+ssize_t __import_iovec(int type, const struct iovec __user *uvec,
+		 unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
+		 struct iov_iter *i, bool compat)
+{
+	ssize_t total_len = 0;
+	unsigned long seg;
+	struct iovec *iov;
+
+	iov = iovec_from_user(uvec, nr_segs, fast_segs, *iovp, compat);
+	if (IS_ERR(iov)) {
+		*iovp = NULL;
+		return PTR_ERR(iov);
 	}
 
 	/*
-	 * According to the Single Unix Specification we should return EINVAL
-	 * if an element length is < 0 when cast to ssize_t or if the
-	 * total length would overflow the ssize_t return value of the
-	 * system call.
+	 * According to the Single Unix Specification we should return EINVAL if
+	 * an element length is < 0 when cast to ssize_t or if the total length
+	 * would overflow the ssize_t return value of the system call.
 	 *
 	 * Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
 	 * overflow case.
 	 */
-	ret = 0;
 	for (seg = 0; seg < nr_segs; seg++) {
-		void __user *buf = iov[seg].iov_base;
 		ssize_t len = (ssize_t)iov[seg].iov_len;
 
-		/* see if we we're about to use an invalid len or if
-		 * it's about to overflow ssize_t */
-		if (len < 0) {
-			ret = -EINVAL;
-			goto out;
+		if (!access_ok(iov[seg].iov_base, len)) {
+			if (iov != *iovp)
+				kfree(iov);
+			*iovp = NULL;
+			return -EFAULT;
 		}
-		if (type >= 0
-		    && unlikely(!access_ok(buf, len))) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len > MAX_RW_COUNT - ret) {
-			len = MAX_RW_COUNT - ret;
+
+		if (len > MAX_RW_COUNT - total_len) {
+			len = MAX_RW_COUNT - total_len;
 			iov[seg].iov_len = len;
 		}
-		ret += len;
+		total_len += len;
 	}
-out:
-	*ret_pointer = iov;
-	return ret;
+
+	iov_iter_init(i, type, iov, nr_segs, total_len);
+	if (iov = *iovp)
+		*iovp = NULL;
+	else
+		*iovp = iov;
+	return total_len;
 }
 
 /**
@@ -1759,10 +1786,10 @@ ssize_t rw_copy_check_uvector(int type, const struct iovec __user *uvector,
  *     &struct iov_iter iterator to access it.
  *
  * @type: One of %READ or %WRITE.
- * @uvector: Pointer to the userspace array.
+ * @uvec: Pointer to the userspace array.
  * @nr_segs: Number of elements in userspace array.
  * @fast_segs: Number of elements in @iov.
- * @iov: (input and output parameter) Pointer to pointer to (usually small
+ * @iovp: (input and output parameter) Pointer to pointer to (usually small
  *     on-stack) kernel array.
  * @i: Pointer to iterator that will be initialized on success.
  *
@@ -1775,120 +1802,21 @@ ssize_t rw_copy_check_uvector(int type, const struct iovec __user *uvector,
  *
  * Return: Negative error code on error, bytes imported on success
  */
-ssize_t import_iovec(int type, const struct iovec __user * uvector,
+ssize_t import_iovec(int type, const struct iovec __user *uvec,
 		 unsigned nr_segs, unsigned fast_segs,
-		 struct iovec **iov, struct iov_iter *i)
+		 struct iovec **iovp, struct iov_iter *i)
 {
-	ssize_t n;
-	struct iovec *p;
-	n = rw_copy_check_uvector(type, uvector, nr_segs, fast_segs,
-				  *iov, &p);
-	if (n < 0) {
-		if (p != *iov)
-			kfree(p);
-		*iov = NULL;
-		return n;
-	}
-	iov_iter_init(i, type, p, nr_segs, n);
-	*iov = p = *iov ? NULL : p;
-	return n;
+	return __import_iovec(type, uvec, nr_segs, fast_segs, iovp, i, false);
 }
 EXPORT_SYMBOL(import_iovec);
 
 #ifdef CONFIG_COMPAT
-#include <linux/compat.h>
-
-ssize_t compat_rw_copy_check_uvector(int type,
-		const struct compat_iovec __user *uvector,
-		unsigned long nr_segs, unsigned long fast_segs,
-		struct iovec *fast_pointer, struct iovec **ret_pointer)
-{
-	compat_ssize_t tot_len;
-	struct iovec *iov = *ret_pointer = fast_pointer;
-	ssize_t ret = 0;
-	int seg;
-
-	/*
-	 * SuS says "The readv() function *may* fail if the iovcnt argument
-	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-	 * traditionally returned zero for zero segments, so...
-	 */
-	if (nr_segs = 0)
-		goto out;
-
-	ret = -EINVAL;
-	if (nr_segs > UIO_MAXIOV)
-		goto out;
-	if (nr_segs > fast_segs) {
-		ret = -ENOMEM;
-		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-		if (iov = NULL)
-			goto out;
-	}
-	*ret_pointer = iov;
-
-	ret = -EFAULT;
-	if (!access_ok(uvector, nr_segs*sizeof(*uvector)))
-		goto out;
-
-	/*
-	 * Single unix specification:
-	 * We should -EINVAL if an element length is not >= 0 and fitting an
-	 * ssize_t.
-	 *
-	 * In Linux, the total length is limited to MAX_RW_COUNT, there is
-	 * no overflow possibility.
-	 */
-	tot_len = 0;
-	ret = -EINVAL;
-	for (seg = 0; seg < nr_segs; seg++) {
-		compat_uptr_t buf;
-		compat_ssize_t len;
-
-		if (__get_user(len, &uvector->iov_len) ||
-		   __get_user(buf, &uvector->iov_base)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len < 0)	/* size_t not fitting in compat_ssize_t .. */
-			goto out;
-		if (type >= 0 &&
-		    !access_ok(compat_ptr(buf), len)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len > MAX_RW_COUNT - tot_len)
-			len = MAX_RW_COUNT - tot_len;
-		tot_len += len;
-		iov->iov_base = compat_ptr(buf);
-		iov->iov_len = (compat_size_t) len;
-		uvector++;
-		iov++;
-	}
-	ret = tot_len;
-
-out:
-	return ret;
-}
-
-ssize_t compat_import_iovec(int type,
-		const struct compat_iovec __user * uvector,
-		unsigned nr_segs, unsigned fast_segs,
-		struct iovec **iov, struct iov_iter *i)
+ssize_t compat_import_iovec(int type, const struct compat_iovec __user *uvec,
+		unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
+		struct iov_iter *i)
 {
-	ssize_t n;
-	struct iovec *p;
-	n = compat_rw_copy_check_uvector(type, uvector, nr_segs, fast_segs,
-				  *iov, &p);
-	if (n < 0) {
-		if (p != *iov)
-			kfree(p);
-		*iov = NULL;
-		return n;
-	}
-	iov_iter_init(i, type, p, nr_segs, n);
-	*iov = p = *iov ? NULL : p;
-	return n;
+	return __import_iovec(type, (const struct iovec __user *)uvec, nr_segs,
+			     fast_segs, iovp, i, true);
 }
 EXPORT_SYMBOL(compat_import_iovec);
 #endif
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 29c052099affdc..5e728c20c2bead 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -276,20 +276,17 @@ static ssize_t process_vm_rw(pid_t pid,
 	if (rc < 0)
 		return rc;
 	if (!iov_iter_count(&iter))
-		goto free_iovecs;
-
-	rc = rw_copy_check_uvector(CHECK_IOVEC_ONLY, rvec, riovcnt, UIO_FASTIOV,
-				   iovstack_r, &iov_r);
-	if (rc <= 0)
-		goto free_iovecs;
-
+		goto free_iov_l;
+	iov_r = iovec_from_user(rvec, riovcnt, UIO_FASTIOV, iovstack_r, false);
+	if (IS_ERR(iov_r)) {
+		rc = PTR_ERR(iov_r);
+		goto free_iov_l;
+	}
 	rc = process_vm_rw_core(pid, &iter, iov_r, riovcnt, flags, vm_write);
-
-free_iovecs:
 	if (iov_r != iovstack_r)
 		kfree(iov_r);
+free_iov_l:
 	kfree(iov_l);
-
 	return rc;
 }
 
@@ -333,18 +330,17 @@ compat_process_vm_rw(compat_pid_t pid,
 	if (rc < 0)
 		return rc;
 	if (!iov_iter_count(&iter))
-		goto free_iovecs;
-	rc = compat_rw_copy_check_uvector(CHECK_IOVEC_ONLY, rvec, riovcnt,
-					  UIO_FASTIOV, iovstack_r,
-					  &iov_r);
-	if (rc <= 0)
-		goto free_iovecs;
-
+		goto free_iov_l;
+	iov_r = iovec_from_user((const struct iovec __user *)rvec, riovcnt,
+				UIO_FASTIOV, iovstack_r, true);
+	if (IS_ERR(iov_r)) {
+		rc = PTR_ERR(iov_r);
+		goto free_iov_l;
+	}
 	rc = process_vm_rw_core(pid, &iter, iov_r, riovcnt, flags, vm_write);
-
-free_iovecs:
 	if (iov_r != iovstack_r)
 		kfree(iov_r);
+free_iov_l:
 	kfree(iov_l);
 	return rc;
 }
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 4/9] iov_iter: transparently handle compat iovecs in import_iovec
  2020-09-23  6:05 let import_iovec deal with compat_iovecs as well v3 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2020-09-23  6:05 ` [PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec Christoph Hellwig
@ 2020-09-23  6:05 ` Christoph Hellwig
  2020-09-23  6:05 ` [PATCH 6/9] fs: remove the compat readv/writev syscalls Christoph Hellwig
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-23  6:05 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Use in compat_syscall to import either native or the compat iovecs, and
remove the now superflous compat_import_iovec.

This removes the need for special compat logic in most callers, and
the remaining ones can still be simplified by using __import_iovec
with a bool compat parameter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/scsi_ioctl.c     | 12 ++----------
 drivers/scsi/sg.c      |  9 +--------
 fs/aio.c               |  8 ++------
 fs/io_uring.c          | 20 ++++++++------------
 fs/read_write.c        |  6 ++++--
 fs/splice.c            |  2 +-
 include/linux/uio.h    |  8 --------
 lib/iov_iter.c         | 14 ++------------
 mm/process_vm_access.c |  3 ++-
 net/compat.c           |  4 ++--
 security/keys/compat.c |  5 ++---
 11 files changed, 26 insertions(+), 65 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index ef722f04f88a93..e08df86866ee5d 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -333,16 +333,8 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
 		struct iov_iter i;
 		struct iovec *iov = NULL;
 
-#ifdef CONFIG_COMPAT
-		if (in_compat_syscall())
-			ret = compat_import_iovec(rq_data_dir(rq),
-				   hdr->dxferp, hdr->iovec_count,
-				   0, &iov, &i);
-		else
-#endif
-			ret = import_iovec(rq_data_dir(rq),
-				   hdr->dxferp, hdr->iovec_count,
-				   0, &iov, &i);
+		ret = import_iovec(rq_data_dir(rq), hdr->dxferp,
+				   hdr->iovec_count, 0, &iov, &i);
 		if (ret < 0)
 			goto out_free_cdb;
 
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 20472aaaf630a4..bfa8d77322d732 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1820,14 +1820,7 @@ sg_start_req(Sg_request *srp, unsigned char *cmd)
 		struct iovec *iov = NULL;
 		struct iov_iter i;
 
-#ifdef CONFIG_COMPAT
-		if (in_compat_syscall())
-			res = compat_import_iovec(rw, hp->dxferp, iov_count,
-						  0, &iov, &i);
-		else
-#endif
-			res = import_iovec(rw, hp->dxferp, iov_count,
-					   0, &iov, &i);
+		res = import_iovec(rw, hp->dxferp, iov_count, 0, &iov, &i);
 		if (res < 0)
 			return res;
 
diff --git a/fs/aio.c b/fs/aio.c
index d5ec303855669d..c45c20d875388c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1489,12 +1489,8 @@ static ssize_t aio_setup_rw(int rw, const struct iocb *iocb,
 		*iovec = NULL;
 		return ret;
 	}
-#ifdef CONFIG_COMPAT
-	if (compat)
-		return compat_import_iovec(rw, buf, len, UIO_FASTIOV, iovec,
-				iter);
-#endif
-	return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter);
+
+	return __import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter, compat);
 }
 
 static inline void aio_rw_done(struct kiocb *req, ssize_t ret)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3790c7fe9fee22..ba84ecea7cb1a4 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2837,13 +2837,8 @@ static ssize_t __io_import_iovec(int rw, struct io_kiocb *req,
 		return ret;
 	}
 
-#ifdef CONFIG_COMPAT
-	if (req->ctx->compat)
-		return compat_import_iovec(rw, buf, sqe_len, UIO_FASTIOV,
-						iovec, iter);
-#endif
-
-	return import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter);
+	return __import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter,
+			      req->ctx->compat);
 }
 
 static ssize_t io_import_iovec(int rw, struct io_kiocb *req,
@@ -4179,8 +4174,9 @@ static int __io_recvmsg_copy_hdr(struct io_kiocb *req,
 				sr->len);
 		iomsg->iov = NULL;
 	} else {
-		ret = import_iovec(READ, uiov, iov_len, UIO_FASTIOV,
-					&iomsg->iov, &iomsg->msg.msg_iter);
+		ret = __import_iovec(READ, uiov, iov_len, UIO_FASTIOV,
+				     &iomsg->iov, &iomsg->msg.msg_iter,
+				     false);
 		if (ret > 0)
 			ret = 0;
 	}
@@ -4220,9 +4216,9 @@ static int __io_compat_recvmsg_copy_hdr(struct io_kiocb *req,
 		sr->len = iomsg->iov[0].iov_len;
 		iomsg->iov = NULL;
 	} else {
-		ret = compat_import_iovec(READ, uiov, len, UIO_FASTIOV,
-						&iomsg->iov,
-						&iomsg->msg.msg_iter);
+		ret = __import_iovec(READ, (struct iovec __user *)uiov, len,
+				   UIO_FASTIOV, &iomsg->iov,
+				   &iomsg->msg.msg_iter, true);
 		if (ret < 0)
 			return ret;
 	}
diff --git a/fs/read_write.c b/fs/read_write.c
index e5e891a88442ef..0a68037580b455 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1078,7 +1078,8 @@ static size_t compat_readv(struct file *file,
 	struct iov_iter iter;
 	ssize_t ret;
 
-	ret = compat_import_iovec(READ, vec, vlen, UIO_FASTIOV, &iov, &iter);
+	ret = import_iovec(READ, (const struct iovec __user *)vec, vlen,
+			   UIO_FASTIOV, &iov, &iter);
 	if (ret >= 0) {
 		ret = do_iter_read(file, &iter, pos, flags);
 		kfree(iov);
@@ -1186,7 +1187,8 @@ static size_t compat_writev(struct file *file,
 	struct iov_iter iter;
 	ssize_t ret;
 
-	ret = compat_import_iovec(WRITE, vec, vlen, UIO_FASTIOV, &iov, &iter);
+	ret = import_iovec(WRITE, (const struct iovec __user *)vec, vlen,
+			   UIO_FASTIOV, &iov, &iter);
 	if (ret >= 0) {
 		file_start_write(file);
 		ret = do_iter_write(file, &iter, pos, flags);
diff --git a/fs/splice.c b/fs/splice.c
index d7c8a7c4db07ff..132d42b9871f9b 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1387,7 +1387,7 @@ COMPAT_SYSCALL_DEFINE4(vmsplice, int, fd, const struct compat_iovec __user *, io
 	if (error)
 		return error;
 
-	error = compat_import_iovec(type, iov32, nr_segs,
+	error = import_iovec(type, (struct iovec __user *)iov32, nr_segs,
 			     ARRAY_SIZE(iovstack), &iov, &iter);
 	if (error >= 0) {
 		error = do_vmsplice(f.file, &iter, flags);
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 92c11fe41c6228..daedc61ad3706e 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -275,14 +275,6 @@ ssize_t import_iovec(int type, const struct iovec __user *uvec,
 ssize_t __import_iovec(int type, const struct iovec __user *uvec,
 		 unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
 		 struct iov_iter *i, bool compat);
-
-#ifdef CONFIG_COMPAT
-struct compat_iovec;
-ssize_t compat_import_iovec(int type, const struct compat_iovec __user * uvector,
-		 unsigned nr_segs, unsigned fast_segs,
-		 struct iovec **iov, struct iov_iter *i);
-#endif
-
 int import_single_range(int type, void __user *buf, size_t len,
 		 struct iovec *iov, struct iov_iter *i);
 
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index d5d8afe31fca16..8c51e1b03814a3 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1806,21 +1806,11 @@ ssize_t import_iovec(int type, const struct iovec __user *uvec,
 		 unsigned nr_segs, unsigned fast_segs,
 		 struct iovec **iovp, struct iov_iter *i)
 {
-	return __import_iovec(type, uvec, nr_segs, fast_segs, iovp, i, false);
+	return __import_iovec(type, uvec, nr_segs, fast_segs, iovp, i,
+			      in_compat_syscall());
 }
 EXPORT_SYMBOL(import_iovec);
 
-#ifdef CONFIG_COMPAT
-ssize_t compat_import_iovec(int type, const struct compat_iovec __user *uvec,
-		unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
-		struct iov_iter *i)
-{
-	return __import_iovec(type, (const struct iovec __user *)uvec, nr_segs,
-			     fast_segs, iovp, i, true);
-}
-EXPORT_SYMBOL(compat_import_iovec);
-#endif
-
 int import_single_range(int rw, void __user *buf, size_t len,
 		 struct iovec *iov, struct iov_iter *i)
 {
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 5e728c20c2bead..3f2156aab44263 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -326,7 +326,8 @@ compat_process_vm_rw(compat_pid_t pid,
 	if (flags != 0)
 		return -EINVAL;
 
-	rc = compat_import_iovec(dir, lvec, liovcnt, UIO_FASTIOV, &iov_l, &iter);
+	rc = import_iovec(dir, (const struct iovec __user *)lvec, liovcnt,
+			  UIO_FASTIOV, &iov_l, &iter);
 	if (rc < 0)
 		return rc;
 	if (!iov_iter_count(&iter))
diff --git a/net/compat.c b/net/compat.c
index 95ce707a30a31d..ddd15af3a2837b 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -98,8 +98,8 @@ int get_compat_msghdr(struct msghdr *kmsg,
 	if (err)
 		return err;
 
-	err = compat_import_iovec(save_addr ? READ : WRITE, compat_ptr(ptr),
-				   len, UIO_FASTIOV, iov, &kmsg->msg_iter);
+	err = import_iovec(save_addr ? READ : WRITE, compat_ptr(ptr), len,
+			   UIO_FASTIOV, iov, &kmsg->msg_iter);
 	return err < 0 ? err : 0;
 }
 
diff --git a/security/keys/compat.c b/security/keys/compat.c
index 6ee9d8f6a4a5bb..7ae531db031cf8 100644
--- a/security/keys/compat.c
+++ b/security/keys/compat.c
@@ -33,9 +33,8 @@ static long compat_keyctl_instantiate_key_iov(
 	if (!_payload_iov)
 		ioc = 0;
 
-	ret = compat_import_iovec(WRITE, _payload_iov, ioc,
-				  ARRAY_SIZE(iovstack), &iov,
-				  &from);
+	ret = import_iovec(WRITE, (const struct iovec __user *)_payload_iov,
+			   ioc, ARRAY_SIZE(iovstack), &iov, &from);
 	if (ret < 0)
 		return ret;
 
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 6/9] fs: remove the compat readv/writev syscalls
  2020-09-23  6:05 let import_iovec deal with compat_iovecs as well v3 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2020-09-23  6:05 ` [PATCH 4/9] iov_iter: transparently handle compat iovecs in import_iovec Christoph Hellwig
@ 2020-09-23  6:05 ` Christoph Hellwig
  2020-09-23  6:05 ` [PATCH 7/9] fs: remove compat_sys_vmsplice Christoph Hellwig
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-23  6:05 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Now that import_iovec handles compat iovecs, the native readv and writev
syscalls can be used for the compat case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/include/asm/unistd32.h                  |  4 ++--
 arch/mips/kernel/syscalls/syscall_n32.tbl          |  4 ++--
 arch/mips/kernel/syscalls/syscall_o32.tbl          |  4 ++--
 arch/parisc/kernel/syscalls/syscall.tbl            |  4 ++--
 arch/powerpc/kernel/syscalls/syscall.tbl           |  4 ++--
 arch/s390/kernel/syscalls/syscall.tbl              |  4 ++--
 arch/sparc/kernel/syscalls/syscall.tbl             |  4 ++--
 arch/x86/entry/syscall_x32.c                       |  2 ++
 arch/x86/entry/syscalls/syscall_32.tbl             |  4 ++--
 arch/x86/entry/syscalls/syscall_64.tbl             |  4 ++--
 fs/read_write.c                                    | 14 --------------
 include/linux/compat.h                             |  4 ----
 include/uapi/asm-generic/unistd.h                  |  4 ++--
 tools/include/uapi/asm-generic/unistd.h            |  4 ++--
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl |  4 ++--
 tools/perf/arch/s390/entry/syscalls/syscall.tbl    |  4 ++--
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  |  4 ++--
 17 files changed, 30 insertions(+), 46 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 734860ac7cf9d5..4a236493dca5b9 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -301,9 +301,9 @@ __SYSCALL(__NR_flock, sys_flock)
 #define __NR_msync 144
 __SYSCALL(__NR_msync, sys_msync)
 #define __NR_readv 145
-__SYSCALL(__NR_readv, compat_sys_readv)
+__SYSCALL(__NR_readv, sys_readv)
 #define __NR_writev 146
-__SYSCALL(__NR_writev, compat_sys_writev)
+__SYSCALL(__NR_writev, sys_writev)
 #define __NR_getsid 147
 __SYSCALL(__NR_getsid, sys_getsid)
 #define __NR_fdatasync 148
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index f9df9edb67a407..c99a92646f8ee9 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -25,8 +25,8 @@
 15	n32	ioctl				compat_sys_ioctl
 16	n32	pread64				sys_pread64
 17	n32	pwrite64			sys_pwrite64
-18	n32	readv				compat_sys_readv
-19	n32	writev				compat_sys_writev
+18	n32	readv				sys_readv
+19	n32	writev				sys_writev
 20	n32	access				sys_access
 21	n32	pipe				sysm_pipe
 22	n32	_newselect			compat_sys_select
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 195b43cf27c848..075064d10661bf 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -156,8 +156,8 @@
 142	o32	_newselect			sys_select			compat_sys_select
 143	o32	flock				sys_flock
 144	o32	msync				sys_msync
-145	o32	readv				sys_readv			compat_sys_readv
-146	o32	writev				sys_writev			compat_sys_writev
+145	o32	readv				sys_readv
+146	o32	writev				sys_writev
 147	o32	cacheflush			sys_cacheflush
 148	o32	cachectl			sys_cachectl
 149	o32	sysmips				__sys_sysmips
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index def64d221cd4fb..192abde0001d9d 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -159,8 +159,8 @@
 142	common	_newselect		sys_select			compat_sys_select
 143	common	flock			sys_flock
 144	common	msync			sys_msync
-145	common	readv			sys_readv			compat_sys_readv
-146	common	writev			sys_writev			compat_sys_writev
+145	common	readv			sys_readv
+146	common	writev			sys_writev
 147	common	getsid			sys_getsid
 148	common	fdatasync		sys_fdatasync
 149	common	_sysctl			sys_ni_syscall
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index c2d737ff2e7bec..6f1e2ecf0edad9 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -193,8 +193,8 @@
 142	common	_newselect			sys_select			compat_sys_select
 143	common	flock				sys_flock
 144	common	msync				sys_msync
-145	common	readv				sys_readv			compat_sys_readv
-146	common	writev				sys_writev			compat_sys_writev
+145	common	readv				sys_readv
+146	common	writev				sys_writev
 147	common	getsid				sys_getsid
 148	common	fdatasync			sys_fdatasync
 149	nospu	_sysctl				sys_ni_syscall
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 10456bc936fb09..6101cf2e004cb4 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -134,8 +134,8 @@
 142  64		select			sys_select			-
 143  common	flock			sys_flock			sys_flock
 144  common	msync			sys_msync			sys_msync
-145  common	readv			sys_readv			compat_sys_readv
-146  common	writev			sys_writev			compat_sys_writev
+145  common	readv			sys_readv			sys_readv
+146  common	writev			sys_writev			sys_writev
 147  common	getsid			sys_getsid			sys_getsid
 148  common	fdatasync		sys_fdatasync			sys_fdatasync
 149  common	_sysctl			-				-
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 4af114e84f2022..a87ddb282ab16f 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -149,8 +149,8 @@
 117	common	getrusage		sys_getrusage			compat_sys_getrusage
 118	common	getsockopt		sys_getsockopt			sys_getsockopt
 119	common	getcwd			sys_getcwd
-120	common	readv			sys_readv			compat_sys_readv
-121	common	writev			sys_writev			compat_sys_writev
+120	common	readv			sys_readv
+121	common	writev			sys_writev
 122	common	settimeofday		sys_settimeofday		compat_sys_settimeofday
 123	32	fchown			sys_fchown16
 123	64	fchown			sys_fchown
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index 1583831f61a9df..aa321444a41f63 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -12,6 +12,8 @@
  * Reuse the 64-bit entry points for the x32 versions that occupy different
  * slots in the syscall table.
  */
+#define __x32_sys_readv		__x64_sys_readv
+#define __x32_sys_writev	__x64_sys_writev
 #define __x32_sys_getsockopt	__x64_sys_getsockopt
 #define __x32_sys_setsockopt	__x64_sys_setsockopt
 
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 9d11028736661b..54ab4beb517f25 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -156,8 +156,8 @@
 142	i386	_newselect		sys_select			compat_sys_select
 143	i386	flock			sys_flock
 144	i386	msync			sys_msync
-145	i386	readv			sys_readv			compat_sys_readv
-146	i386	writev			sys_writev			compat_sys_writev
+145	i386	readv			sys_readv
+146	i386	writev			sys_writev
 147	i386	getsid			sys_getsid
 148	i386	fdatasync		sys_fdatasync
 149	i386	_sysctl			sys_ni_syscall
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index f30d6ae9a6883c..b1e59957c5c51c 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -371,8 +371,8 @@
 512	x32	rt_sigaction		compat_sys_rt_sigaction
 513	x32	rt_sigreturn		compat_sys_x32_rt_sigreturn
 514	x32	ioctl			compat_sys_ioctl
-515	x32	readv			compat_sys_readv
-516	x32	writev			compat_sys_writev
+515	x32	readv			sys_readv
+516	x32	writev			sys_writev
 517	x32	recvfrom		compat_sys_recvfrom
 518	x32	sendmsg			compat_sys_sendmsg
 519	x32	recvmsg			compat_sys_recvmsg
diff --git a/fs/read_write.c b/fs/read_write.c
index eab427b7cc0a3f..6c13f744c34a38 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1074,13 +1074,6 @@ SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
  * in_compat_syscall().
  */
 #ifdef CONFIG_COMPAT
-COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
-		const struct iovec __user *, vec,
-		compat_ulong_t, vlen)
-{
-	return do_readv(fd, vec, vlen, 0);
-}
-
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
 COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
 		const struct iovec __user *, vec,
@@ -1122,13 +1115,6 @@ COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
 	return do_preadv(fd, vec, vlen, pos, flags);
 }
 
-COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
-		const struct iovec __user *, vec,
-		compat_ulong_t, vlen)
-{
-	return do_writev(fd, vec, vlen, 0);
-}
-
 #ifdef __ARCH_WANT_COMPAT_SYS_PWRITEV64
 COMPAT_SYSCALL_DEFINE4(pwritev64, unsigned long, fd,
 		const struct iovec __user *, vec,
diff --git a/include/linux/compat.h b/include/linux/compat.h
index b930de791ff16b..fa39b7a5488d70 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -545,10 +545,6 @@ asmlinkage long compat_sys_getdents(unsigned int fd,
 
 /* fs/read_write.c */
 asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int);
-asmlinkage ssize_t compat_sys_readv(compat_ulong_t fd,
-		const struct compat_iovec __user *vec, compat_ulong_t vlen);
-asmlinkage ssize_t compat_sys_writev(compat_ulong_t fd,
-		const struct compat_iovec __user *vec, compat_ulong_t vlen);
 /* No generic prototype for pread64 and pwrite64 */
 asmlinkage ssize_t compat_sys_preadv(compat_ulong_t fd,
 		const struct compat_iovec __user *vec,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 995b36c2ea7d8a..211c9eacbda6eb 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -207,9 +207,9 @@ __SYSCALL(__NR_read, sys_read)
 #define __NR_write 64
 __SYSCALL(__NR_write, sys_write)
 #define __NR_readv 65
-__SC_COMP(__NR_readv, sys_readv, compat_sys_readv)
+__SC_COMP(__NR_readv, sys_readv, sys_readv)
 #define __NR_writev 66
-__SC_COMP(__NR_writev, sys_writev, compat_sys_writev)
+__SC_COMP(__NR_writev, sys_writev, sys_writev)
 #define __NR_pread64 67
 __SC_COMP(__NR_pread64, sys_pread64, compat_sys_pread64)
 #define __NR_pwrite64 68
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index 995b36c2ea7d8a..211c9eacbda6eb 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -207,9 +207,9 @@ __SYSCALL(__NR_read, sys_read)
 #define __NR_write 64
 __SYSCALL(__NR_write, sys_write)
 #define __NR_readv 65
-__SC_COMP(__NR_readv, sys_readv, compat_sys_readv)
+__SC_COMP(__NR_readv, sys_readv, sys_readv)
 #define __NR_writev 66
-__SC_COMP(__NR_writev, sys_writev, compat_sys_writev)
+__SC_COMP(__NR_writev, sys_writev, sys_writev)
 #define __NR_pread64 67
 __SC_COMP(__NR_pread64, sys_pread64, compat_sys_pread64)
 #define __NR_pwrite64 68
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index 3ca6fe057a0b1f..46be68029587f9 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -189,8 +189,8 @@
 142	common	_newselect			sys_select			compat_sys_select
 143	common	flock				sys_flock
 144	common	msync				sys_msync
-145	common	readv				sys_readv			compat_sys_readv
-146	common	writev				sys_writev			compat_sys_writev
+145	common	readv				sys_readv
+146	common	writev				sys_writev
 147	common	getsid				sys_getsid
 148	common	fdatasync			sys_fdatasync
 149	nospu	_sysctl				sys_ni_syscall
diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
index 6a0bbea225db0d..fb5e61ce9d5838 100644
--- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
@@ -134,8 +134,8 @@
 142  64		select			sys_select			-
 143  common	flock			sys_flock			sys_flock
 144  common	msync			sys_msync			compat_sys_msync
-145  common	readv			sys_readv			compat_sys_readv
-146  common	writev			sys_writev			compat_sys_writev
+145  common	readv			sys_readv
+146  common	writev			sys_writev
 147  common	getsid			sys_getsid			sys_getsid
 148  common	fdatasync		sys_fdatasync			sys_fdatasync
 149  common	_sysctl			-				-
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
index f30d6ae9a6883c..b1e59957c5c51c 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -371,8 +371,8 @@
 512	x32	rt_sigaction		compat_sys_rt_sigaction
 513	x32	rt_sigreturn		compat_sys_x32_rt_sigreturn
 514	x32	ioctl			compat_sys_ioctl
-515	x32	readv			compat_sys_readv
-516	x32	writev			compat_sys_writev
+515	x32	readv			sys_readv
+516	x32	writev			sys_writev
 517	x32	recvfrom		compat_sys_recvfrom
 518	x32	sendmsg			compat_sys_sendmsg
 519	x32	recvmsg			compat_sys_recvmsg
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 7/9] fs: remove compat_sys_vmsplice
  2020-09-23  6:05 let import_iovec deal with compat_iovecs as well v3 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2020-09-23  6:05 ` [PATCH 6/9] fs: remove the compat readv/writev syscalls Christoph Hellwig
@ 2020-09-23  6:05 ` Christoph Hellwig
  2020-09-23  6:05 ` [PATCH 8/9] mm: remove compat_process_vm_{readv,writev} Christoph Hellwig
  2020-09-23  6:05 ` [PATCH 9/9] security/keys: remove compat_keyctl_instantiate_key_iov Christoph Hellwig
  7 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-23  6:05 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Now that import_iovec handles compat iovecs, the native vmsplice syscall
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/include/asm/unistd32.h             |  2 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl     |  2 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl     |  2 +-
 arch/parisc/kernel/syscalls/syscall.tbl       |  2 +-
 arch/powerpc/kernel/syscalls/syscall.tbl      |  2 +-
 arch/s390/kernel/syscalls/syscall.tbl         |  2 +-
 arch/sparc/kernel/syscalls/syscall.tbl        |  2 +-
 arch/x86/entry/syscall_x32.c                  |  1 +
 arch/x86/entry/syscalls/syscall_32.tbl        |  2 +-
 arch/x86/entry/syscalls/syscall_64.tbl        |  2 +-
 fs/splice.c                                   | 57 +++++--------------
 include/linux/compat.h                        |  4 --
 include/uapi/asm-generic/unistd.h             |  2 +-
 tools/include/uapi/asm-generic/unistd.h       |  2 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  2 +-
 .../perf/arch/s390/entry/syscalls/syscall.tbl |  2 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl    |  2 +-
 17 files changed, 28 insertions(+), 62 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 4a236493dca5b9..11dfae3a8563bd 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -697,7 +697,7 @@ __SYSCALL(__NR_sync_file_range2, compat_sys_aarch32_sync_file_range2)
 #define __NR_tee 342
 __SYSCALL(__NR_tee, sys_tee)
 #define __NR_vmsplice 343
-__SYSCALL(__NR_vmsplice, compat_sys_vmsplice)
+__SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages 344
 __SYSCALL(__NR_move_pages, compat_sys_move_pages)
 #define __NR_getcpu 345
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index c99a92646f8ee9..5a39d4de0ac85b 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -278,7 +278,7 @@
 267	n32	splice				sys_splice
 268	n32	sync_file_range			sys_sync_file_range
 269	n32	tee				sys_tee
-270	n32	vmsplice			compat_sys_vmsplice
+270	n32	vmsplice			sys_vmsplice
 271	n32	move_pages			compat_sys_move_pages
 272	n32	set_robust_list			compat_sys_set_robust_list
 273	n32	get_robust_list			compat_sys_get_robust_list
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 075064d10661bf..136efc6b8c5444 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -318,7 +318,7 @@
 304	o32	splice				sys_splice
 305	o32	sync_file_range			sys_sync_file_range		sys32_sync_file_range
 306	o32	tee				sys_tee
-307	o32	vmsplice			sys_vmsplice			compat_sys_vmsplice
+307	o32	vmsplice			sys_vmsplice
 308	o32	move_pages			sys_move_pages			compat_sys_move_pages
 309	o32	set_robust_list			sys_set_robust_list		compat_sys_set_robust_list
 310	o32	get_robust_list			sys_get_robust_list		compat_sys_get_robust_list
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 192abde0001d9d..a9e184192caedd 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -330,7 +330,7 @@
 292	32	sync_file_range		parisc_sync_file_range
 292	64	sync_file_range		sys_sync_file_range
 293	common	tee			sys_tee
-294	common	vmsplice		sys_vmsplice			compat_sys_vmsplice
+294	common	vmsplice		sys_vmsplice
 295	common	move_pages		sys_move_pages			compat_sys_move_pages
 296	common	getcpu			sys_getcpu
 297	common	epoll_pwait		sys_epoll_pwait			compat_sys_epoll_pwait
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 6f1e2ecf0edad9..0d4985919ca34d 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -369,7 +369,7 @@
 282	common	unshare				sys_unshare
 283	common	splice				sys_splice
 284	common	tee				sys_tee
-285	common	vmsplice			sys_vmsplice			compat_sys_vmsplice
+285	common	vmsplice			sys_vmsplice
 286	common	openat				sys_openat			compat_sys_openat
 287	common	mkdirat				sys_mkdirat
 288	common	mknodat				sys_mknodat
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 6101cf2e004cb4..b5495a42814bd1 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -316,7 +316,7 @@
 306  common	splice			sys_splice			sys_splice
 307  common	sync_file_range		sys_sync_file_range		compat_sys_s390_sync_file_range
 308  common	tee			sys_tee				sys_tee
-309  common	vmsplice		sys_vmsplice			compat_sys_vmsplice
+309  common	vmsplice		sys_vmsplice			sys_vmsplice
 310  common	move_pages		sys_move_pages			compat_sys_move_pages
 311  common	getcpu			sys_getcpu			sys_getcpu
 312  common	epoll_pwait		sys_epoll_pwait			compat_sys_epoll_pwait
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index a87ddb282ab16f..f1810c1a35caa5 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -38,7 +38,7 @@
 23	64    	setuid			sys_setuid
 24	32	getuid			sys_getuid16
 24	64   	getuid			sys_getuid
-25	common	vmsplice		sys_vmsplice			compat_sys_vmsplice
+25	common	vmsplice		sys_vmsplice
 26	common	ptrace			sys_ptrace			compat_sys_ptrace
 27	common	alarm			sys_alarm
 28	common	sigaltstack		sys_sigaltstack			compat_sys_sigaltstack
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index aa321444a41f63..a4840b9d50ad14 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -16,6 +16,7 @@
 #define __x32_sys_writev	__x64_sys_writev
 #define __x32_sys_getsockopt	__x64_sys_getsockopt
 #define __x32_sys_setsockopt	__x64_sys_setsockopt
+#define __x32_sys_vmsplice	__x64_sys_vmsplice
 
 #define __SYSCALL_64(nr, sym)
 
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 54ab4beb517f25..0fb2f172581e51 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -327,7 +327,7 @@
 313	i386	splice			sys_splice
 314	i386	sync_file_range		sys_ia32_sync_file_range
 315	i386	tee			sys_tee
-316	i386	vmsplice		sys_vmsplice			compat_sys_vmsplice
+316	i386	vmsplice		sys_vmsplice
 317	i386	move_pages		sys_move_pages			compat_sys_move_pages
 318	i386	getcpu			sys_getcpu
 319	i386	epoll_pwait		sys_epoll_pwait
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index b1e59957c5c51c..642af919183de4 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -388,7 +388,7 @@
 529	x32	waitid			compat_sys_waitid
 530	x32	set_robust_list		compat_sys_set_robust_list
 531	x32	get_robust_list		compat_sys_get_robust_list
-532	x32	vmsplice		compat_sys_vmsplice
+532	x32	vmsplice		sys_vmsplice
 533	x32	move_pages		compat_sys_move_pages
 534	x32	preadv			compat_sys_preadv64
 535	x32	pwritev			compat_sys_pwritev64
diff --git a/fs/splice.c b/fs/splice.c
index 132d42b9871f9b..18d84544030b39 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -33,7 +33,6 @@
 #include <linux/security.h>
 #include <linux/gfp.h>
 #include <linux/socket.h>
-#include <linux/compat.h>
 #include <linux/sched/signal.h>
 
 #include "internal.h"
@@ -1332,20 +1331,6 @@ static int vmsplice_type(struct fd f, int *type)
  * Currently we punt and implement it as a normal copy, see pipe_to_user().
  *
  */
-static long do_vmsplice(struct file *f, struct iov_iter *iter, unsigned int flags)
-{
-	if (unlikely(flags & ~SPLICE_F_ALL))
-		return -EINVAL;
-
-	if (!iov_iter_count(iter))
-		return 0;
-
-	if (iov_iter_rw(iter) = WRITE)
-		return vmsplice_to_pipe(f, iter, flags);
-	else
-		return vmsplice_to_user(f, iter, flags);
-}
-
 SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
 		unsigned long, nr_segs, unsigned int, flags)
 {
@@ -1356,6 +1341,9 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
 	struct fd f;
 	int type;
 
+	if (unlikely(flags & ~SPLICE_F_ALL))
+		return -EINVAL;
+
 	f = fdget(fd);
 	error = vmsplice_type(f, &type);
 	if (error)
@@ -1363,40 +1351,21 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
 
 	error = import_iovec(type, uiov, nr_segs,
 			     ARRAY_SIZE(iovstack), &iov, &iter);
-	if (error >= 0) {
-		error = do_vmsplice(f.file, &iter, flags);
-		kfree(iov);
-	}
-	fdput(f);
-	return error;
-}
+	if (error < 0)
+		goto out_fdput;
 
-#ifdef CONFIG_COMPAT
-COMPAT_SYSCALL_DEFINE4(vmsplice, int, fd, const struct compat_iovec __user *, iov32,
-		    unsigned int, nr_segs, unsigned int, flags)
-{
-	struct iovec iovstack[UIO_FASTIOV];
-	struct iovec *iov = iovstack;
-	struct iov_iter iter;
-	ssize_t error;
-	struct fd f;
-	int type;
-
-	f = fdget(fd);
-	error = vmsplice_type(f, &type);
-	if (error)
-		return error;
+	if (!iov_iter_count(&iter))
+		error = 0;
+	else if (iov_iter_rw(&iter) = WRITE)
+		error = vmsplice_to_pipe(f.file, &iter, flags);
+	else
+		error = vmsplice_to_user(f.file, &iter, flags);
 
-	error = import_iovec(type, (struct iovec __user *)iov32, nr_segs,
-			     ARRAY_SIZE(iovstack), &iov, &iter);
-	if (error >= 0) {
-		error = do_vmsplice(f.file, &iter, flags);
-		kfree(iov);
-	}
+	kfree(iov);
+out_fdput:
 	fdput(f);
 	return error;
 }
-#endif
 
 SYSCALL_DEFINE6(splice, int, fd_in, loff_t __user *, off_in,
 		int, fd_out, loff_t __user *, off_out,
diff --git a/include/linux/compat.h b/include/linux/compat.h
index fa39b7a5488d70..14d8bf412bd02e 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -597,10 +597,6 @@ asmlinkage long compat_sys_signalfd4(int ufd,
 				     const compat_sigset_t __user *sigmask,
 				     compat_size_t sigsetsize, int flags);
 
-/* fs/splice.c */
-asmlinkage long compat_sys_vmsplice(int fd, const struct compat_iovec __user *,
-				    unsigned int nr_segs, unsigned int flags);
-
 /* fs/stat.c */
 asmlinkage long compat_sys_newfstatat(unsigned int dfd,
 				      const char __user *filename,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 211c9eacbda6eb..f2dcb0d5703014 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -237,7 +237,7 @@ __SC_COMP(__NR_signalfd4, sys_signalfd4, compat_sys_signalfd4)
 
 /* fs/splice.c */
 #define __NR_vmsplice 75
-__SC_COMP(__NR_vmsplice, sys_vmsplice, compat_sys_vmsplice)
+__SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_splice 76
 __SYSCALL(__NR_splice, sys_splice)
 #define __NR_tee 77
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index 211c9eacbda6eb..f2dcb0d5703014 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -237,7 +237,7 @@ __SC_COMP(__NR_signalfd4, sys_signalfd4, compat_sys_signalfd4)
 
 /* fs/splice.c */
 #define __NR_vmsplice 75
-__SC_COMP(__NR_vmsplice, sys_vmsplice, compat_sys_vmsplice)
+__SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_splice 76
 __SYSCALL(__NR_splice, sys_splice)
 #define __NR_tee 77
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index 46be68029587f9..26f0347c15118b 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -363,7 +363,7 @@
 282	common	unshare				sys_unshare
 283	common	splice				sys_splice
 284	common	tee				sys_tee
-285	common	vmsplice			sys_vmsplice			compat_sys_vmsplice
+285	common	vmsplice			sys_vmsplice
 286	common	openat				sys_openat			compat_sys_openat
 287	common	mkdirat				sys_mkdirat
 288	common	mknodat				sys_mknodat
diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
index fb5e61ce9d5838..02ad81f69bb7e3 100644
--- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
@@ -316,7 +316,7 @@
 306  common	splice			sys_splice			compat_sys_splice
 307  common	sync_file_range		sys_sync_file_range		compat_sys_s390_sync_file_range
 308  common	tee			sys_tee				compat_sys_tee
-309  common	vmsplice		sys_vmsplice			compat_sys_vmsplice
+309  common	vmsplice		sys_vmsplice			sys_vmsplice
 310  common	move_pages		sys_move_pages			compat_sys_move_pages
 311  common	getcpu			sys_getcpu			compat_sys_getcpu
 312  common	epoll_pwait		sys_epoll_pwait			compat_sys_epoll_pwait
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
index b1e59957c5c51c..642af919183de4 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -388,7 +388,7 @@
 529	x32	waitid			compat_sys_waitid
 530	x32	set_robust_list		compat_sys_set_robust_list
 531	x32	get_robust_list		compat_sys_get_robust_list
-532	x32	vmsplice		compat_sys_vmsplice
+532	x32	vmsplice		sys_vmsplice
 533	x32	move_pages		compat_sys_move_pages
 534	x32	preadv			compat_sys_preadv64
 535	x32	pwritev			compat_sys_pwritev64
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 8/9] mm: remove compat_process_vm_{readv,writev}
  2020-09-23  6:05 let import_iovec deal with compat_iovecs as well v3 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2020-09-23  6:05 ` [PATCH 7/9] fs: remove compat_sys_vmsplice Christoph Hellwig
@ 2020-09-23  6:05 ` Christoph Hellwig
  2020-09-23  6:05 ` [PATCH 9/9] security/keys: remove compat_keyctl_instantiate_key_iov Christoph Hellwig
  7 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-23  6:05 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Now that import_iovec handles compat iovecs, the native syscalls
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/include/asm/unistd32.h             |  4 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl     |  4 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl     |  4 +-
 arch/parisc/kernel/syscalls/syscall.tbl       |  4 +-
 arch/powerpc/kernel/syscalls/syscall.tbl      |  4 +-
 arch/s390/kernel/syscalls/syscall.tbl         |  4 +-
 arch/sparc/kernel/syscalls/syscall.tbl        |  4 +-
 arch/x86/entry/syscall_x32.c                  |  2 +
 arch/x86/entry/syscalls/syscall_32.tbl        |  4 +-
 arch/x86/entry/syscalls/syscall_64.tbl        |  4 +-
 include/linux/compat.h                        |  8 ---
 include/uapi/asm-generic/unistd.h             |  6 +-
 mm/process_vm_access.c                        | 69 -------------------
 tools/include/uapi/asm-generic/unistd.h       |  6 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  4 +-
 .../perf/arch/s390/entry/syscalls/syscall.tbl |  4 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl    |  4 +-
 17 files changed, 30 insertions(+), 109 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 11dfae3a8563bd..0c280a05f699bf 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -763,9 +763,9 @@ __SYSCALL(__NR_sendmmsg, compat_sys_sendmmsg)
 #define __NR_setns 375
 __SYSCALL(__NR_setns, sys_setns)
 #define __NR_process_vm_readv 376
-__SYSCALL(__NR_process_vm_readv, compat_sys_process_vm_readv)
+__SYSCALL(__NR_process_vm_readv, sys_process_vm_readv)
 #define __NR_process_vm_writev 377
-__SYSCALL(__NR_process_vm_writev, compat_sys_process_vm_writev)
+__SYSCALL(__NR_process_vm_writev, sys_process_vm_writev)
 #define __NR_kcmp 378
 __SYSCALL(__NR_kcmp, sys_kcmp)
 #define __NR_finit_module 379
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 5a39d4de0ac85b..0bc2e0fcf1ee56 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -317,8 +317,8 @@
 306	n32	syncfs				sys_syncfs
 307	n32	sendmmsg			compat_sys_sendmmsg
 308	n32	setns				sys_setns
-309	n32	process_vm_readv		compat_sys_process_vm_readv
-310	n32	process_vm_writev		compat_sys_process_vm_writev
+309	n32	process_vm_readv		sys_process_vm_readv
+310	n32	process_vm_writev		sys_process_vm_writev
 311	n32	kcmp				sys_kcmp
 312	n32	finit_module			sys_finit_module
 313	n32	sched_setattr			sys_sched_setattr
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 136efc6b8c5444..b408c13b934296 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -356,8 +356,8 @@
 342	o32	syncfs				sys_syncfs
 343	o32	sendmmsg			sys_sendmmsg			compat_sys_sendmmsg
 344	o32	setns				sys_setns
-345	o32	process_vm_readv		sys_process_vm_readv		compat_sys_process_vm_readv
-346	o32	process_vm_writev		sys_process_vm_writev		compat_sys_process_vm_writev
+345	o32	process_vm_readv		sys_process_vm_readv
+346	o32	process_vm_writev		sys_process_vm_writev
 347	o32	kcmp				sys_kcmp
 348	o32	finit_module			sys_finit_module
 349	o32	sched_setattr			sys_sched_setattr
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index a9e184192caedd..2015a5124b78ad 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -372,8 +372,8 @@
 327	common	syncfs			sys_syncfs
 328	common	setns			sys_setns
 329	common	sendmmsg		sys_sendmmsg			compat_sys_sendmmsg
-330	common	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-331	common	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+330	common	process_vm_readv	sys_process_vm_readv
+331	common	process_vm_writev	sys_process_vm_writev
 332	common	kcmp			sys_kcmp
 333	common	finit_module		sys_finit_module
 334	common	sched_setattr		sys_sched_setattr
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 0d4985919ca34d..66a472aa635d3f 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -449,8 +449,8 @@
 348	common	syncfs				sys_syncfs
 349	common	sendmmsg			sys_sendmmsg			compat_sys_sendmmsg
 350	common	setns				sys_setns
-351	nospu	process_vm_readv		sys_process_vm_readv		compat_sys_process_vm_readv
-352	nospu	process_vm_writev		sys_process_vm_writev		compat_sys_process_vm_writev
+351	nospu	process_vm_readv		sys_process_vm_readv
+352	nospu	process_vm_writev		sys_process_vm_writev
 353	nospu	finit_module			sys_finit_module
 354	nospu	kcmp				sys_kcmp
 355	common	sched_setattr			sys_sched_setattr
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index b5495a42814bd1..7485867a490bb2 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -347,8 +347,8 @@
 337  common	clock_adjtime		sys_clock_adjtime		sys_clock_adjtime32
 338  common	syncfs			sys_syncfs			sys_syncfs
 339  common	setns			sys_setns			sys_setns
-340  common	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-341  common	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+340  common	process_vm_readv	sys_process_vm_readv		sys_process_vm_readv
+341  common	process_vm_writev	sys_process_vm_writev		sys_process_vm_writev
 342  common	s390_runtime_instr	sys_s390_runtime_instr		sys_s390_runtime_instr
 343  common	kcmp			sys_kcmp			sys_kcmp
 344  common	finit_module		sys_finit_module		sys_finit_module
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index f1810c1a35caa5..4a9365b2e340b2 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -406,8 +406,8 @@
 335	common	syncfs			sys_syncfs
 336	common	sendmmsg		sys_sendmmsg			compat_sys_sendmmsg
 337	common	setns			sys_setns
-338	common	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-339	common	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+338	common	process_vm_readv	sys_process_vm_readv
+339	common	process_vm_writev	sys_process_vm_writev
 340	32	kern_features		sys_ni_syscall			sys_kern_features
 340	64	kern_features		sys_kern_features
 341	common	kcmp			sys_kcmp
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index a4840b9d50ad14..f2fe0a33bcfdd5 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -17,6 +17,8 @@
 #define __x32_sys_getsockopt	__x64_sys_getsockopt
 #define __x32_sys_setsockopt	__x64_sys_setsockopt
 #define __x32_sys_vmsplice	__x64_sys_vmsplice
+#define __x32_sys_process_vm_readv	__x64_sys_process_vm_readv
+#define __x32_sys_process_vm_writev	__x64_sys_process_vm_writev
 
 #define __SYSCALL_64(nr, sym)
 
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 0fb2f172581e51..5fbe10ad8a23fc 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -358,8 +358,8 @@
 344	i386	syncfs			sys_syncfs
 345	i386	sendmmsg		sys_sendmmsg			compat_sys_sendmmsg
 346	i386	setns			sys_setns
-347	i386	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-348	i386	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+347	i386	process_vm_readv	sys_process_vm_readv
+348	i386	process_vm_writev	sys_process_vm_writev
 349	i386	kcmp			sys_kcmp
 350	i386	finit_module		sys_finit_module
 351	i386	sched_setattr		sys_sched_setattr
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 642af919183de4..347809649ba28f 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -395,8 +395,8 @@
 536	x32	rt_tgsigqueueinfo	compat_sys_rt_tgsigqueueinfo
 537	x32	recvmmsg		compat_sys_recvmmsg_time64
 538	x32	sendmmsg		compat_sys_sendmmsg
-539	x32	process_vm_readv	compat_sys_process_vm_readv
-540	x32	process_vm_writev	compat_sys_process_vm_writev
+539	x32	process_vm_readv	sys_process_vm_readv
+540	x32	process_vm_writev	sys_process_vm_writev
 541	x32	setsockopt		sys_setsockopt
 542	x32	getsockopt		sys_getsockopt
 543	x32	io_setup		compat_sys_io_setup
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 14d8bf412bd02e..d46e5905970817 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -780,14 +780,6 @@ asmlinkage long compat_sys_open_by_handle_at(int mountdirfd,
 					     int flags);
 asmlinkage long compat_sys_sendmmsg(int fd, struct compat_mmsghdr __user *mmsg,
 				    unsigned vlen, unsigned int flags);
-asmlinkage ssize_t compat_sys_process_vm_readv(compat_pid_t pid,
-		const struct compat_iovec __user *lvec,
-		compat_ulong_t liovcnt, const struct compat_iovec __user *rvec,
-		compat_ulong_t riovcnt, compat_ulong_t flags);
-asmlinkage ssize_t compat_sys_process_vm_writev(compat_pid_t pid,
-		const struct compat_iovec __user *lvec,
-		compat_ulong_t liovcnt, const struct compat_iovec __user *rvec,
-		compat_ulong_t riovcnt, compat_ulong_t flags);
 asmlinkage long compat_sys_execveat(int dfd, const char __user *filename,
 		     const compat_uptr_t __user *argv,
 		     const compat_uptr_t __user *envp, int flags);
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index f2dcb0d5703014..c1dfe99c9c3f70 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -727,11 +727,9 @@ __SYSCALL(__NR_setns, sys_setns)
 #define __NR_sendmmsg 269
 __SC_COMP(__NR_sendmmsg, sys_sendmmsg, compat_sys_sendmmsg)
 #define __NR_process_vm_readv 270
-__SC_COMP(__NR_process_vm_readv, sys_process_vm_readv, \
-          compat_sys_process_vm_readv)
+__SYSCALL(__NR_process_vm_readv, sys_process_vm_readv)
 #define __NR_process_vm_writev 271
-__SC_COMP(__NR_process_vm_writev, sys_process_vm_writev, \
-          compat_sys_process_vm_writev)
+__SYSCALL(__NR_process_vm_writev, sys_process_vm_writev)
 #define __NR_kcmp 272
 __SYSCALL(__NR_kcmp, sys_kcmp)
 #define __NR_finit_module 273
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 3f2156aab44263..fd12da80b6f27b 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -14,10 +14,6 @@
 #include <linux/slab.h>
 #include <linux/syscalls.h>
 
-#ifdef CONFIG_COMPAT
-#include <linux/compat.h>
-#endif
-
 /**
  * process_vm_rw_pages - read/write pages from task specified
  * @pages: array of pointers to pages we want to copy
@@ -304,68 +300,3 @@ SYSCALL_DEFINE6(process_vm_writev, pid_t, pid,
 {
 	return process_vm_rw(pid, lvec, liovcnt, rvec, riovcnt, flags, 1);
 }
-
-#ifdef CONFIG_COMPAT
-
-static ssize_t
-compat_process_vm_rw(compat_pid_t pid,
-		     const struct compat_iovec __user *lvec,
-		     unsigned long liovcnt,
-		     const struct compat_iovec __user *rvec,
-		     unsigned long riovcnt,
-		     unsigned long flags, int vm_write)
-{
-	struct iovec iovstack_l[UIO_FASTIOV];
-	struct iovec iovstack_r[UIO_FASTIOV];
-	struct iovec *iov_l = iovstack_l;
-	struct iovec *iov_r = iovstack_r;
-	struct iov_iter iter;
-	ssize_t rc = -EFAULT;
-	int dir = vm_write ? WRITE : READ;
-
-	if (flags != 0)
-		return -EINVAL;
-
-	rc = import_iovec(dir, (const struct iovec __user *)lvec, liovcnt,
-			  UIO_FASTIOV, &iov_l, &iter);
-	if (rc < 0)
-		return rc;
-	if (!iov_iter_count(&iter))
-		goto free_iov_l;
-	iov_r = iovec_from_user((const struct iovec __user *)rvec, riovcnt,
-				UIO_FASTIOV, iovstack_r, true);
-	if (IS_ERR(iov_r)) {
-		rc = PTR_ERR(iov_r);
-		goto free_iov_l;
-	}
-	rc = process_vm_rw_core(pid, &iter, iov_r, riovcnt, flags, vm_write);
-	if (iov_r != iovstack_r)
-		kfree(iov_r);
-free_iov_l:
-	kfree(iov_l);
-	return rc;
-}
-
-COMPAT_SYSCALL_DEFINE6(process_vm_readv, compat_pid_t, pid,
-		       const struct compat_iovec __user *, lvec,
-		       compat_ulong_t, liovcnt,
-		       const struct compat_iovec __user *, rvec,
-		       compat_ulong_t, riovcnt,
-		       compat_ulong_t, flags)
-{
-	return compat_process_vm_rw(pid, lvec, liovcnt, rvec,
-				    riovcnt, flags, 0);
-}
-
-COMPAT_SYSCALL_DEFINE6(process_vm_writev, compat_pid_t, pid,
-		       const struct compat_iovec __user *, lvec,
-		       compat_ulong_t, liovcnt,
-		       const struct compat_iovec __user *, rvec,
-		       compat_ulong_t, riovcnt,
-		       compat_ulong_t, flags)
-{
-	return compat_process_vm_rw(pid, lvec, liovcnt, rvec,
-				    riovcnt, flags, 1);
-}
-
-#endif
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index f2dcb0d5703014..c1dfe99c9c3f70 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -727,11 +727,9 @@ __SYSCALL(__NR_setns, sys_setns)
 #define __NR_sendmmsg 269
 __SC_COMP(__NR_sendmmsg, sys_sendmmsg, compat_sys_sendmmsg)
 #define __NR_process_vm_readv 270
-__SC_COMP(__NR_process_vm_readv, sys_process_vm_readv, \
-          compat_sys_process_vm_readv)
+__SYSCALL(__NR_process_vm_readv, sys_process_vm_readv)
 #define __NR_process_vm_writev 271
-__SC_COMP(__NR_process_vm_writev, sys_process_vm_writev, \
-          compat_sys_process_vm_writev)
+__SYSCALL(__NR_process_vm_writev, sys_process_vm_writev)
 #define __NR_kcmp 272
 __SYSCALL(__NR_kcmp, sys_kcmp)
 #define __NR_finit_module 273
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index 26f0347c15118b..a188f053cbf90a 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -443,8 +443,8 @@
 348	common	syncfs				sys_syncfs
 349	common	sendmmsg			sys_sendmmsg			compat_sys_sendmmsg
 350	common	setns				sys_setns
-351	nospu	process_vm_readv		sys_process_vm_readv		compat_sys_process_vm_readv
-352	nospu	process_vm_writev		sys_process_vm_writev		compat_sys_process_vm_writev
+351	nospu	process_vm_readv		sys_process_vm_readv
+352	nospu	process_vm_writev		sys_process_vm_writev
 353	nospu	finit_module			sys_finit_module
 354	nospu	kcmp				sys_kcmp
 355	common	sched_setattr			sys_sched_setattr
diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
index 02ad81f69bb7e3..c44c83032c3a04 100644
--- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
@@ -347,8 +347,8 @@
 337  common	clock_adjtime		sys_clock_adjtime		compat_sys_clock_adjtime
 338  common	syncfs			sys_syncfs			sys_syncfs
 339  common	setns			sys_setns			sys_setns
-340  common	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-341  common	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+340  common	process_vm_readv	sys_process_vm_readv		sys_process_vm_readv
+341  common	process_vm_writev	sys_process_vm_writev		sys_process_vm_writev
 342  common	s390_runtime_instr	sys_s390_runtime_instr		sys_s390_runtime_instr
 343  common	kcmp			sys_kcmp			compat_sys_kcmp
 344  common	finit_module		sys_finit_module		compat_sys_finit_module
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
index 642af919183de4..347809649ba28f 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -395,8 +395,8 @@
 536	x32	rt_tgsigqueueinfo	compat_sys_rt_tgsigqueueinfo
 537	x32	recvmmsg		compat_sys_recvmmsg_time64
 538	x32	sendmmsg		compat_sys_sendmmsg
-539	x32	process_vm_readv	compat_sys_process_vm_readv
-540	x32	process_vm_writev	compat_sys_process_vm_writev
+539	x32	process_vm_readv	sys_process_vm_readv
+540	x32	process_vm_writev	sys_process_vm_writev
 541	x32	setsockopt		sys_setsockopt
 542	x32	getsockopt		sys_getsockopt
 543	x32	io_setup		compat_sys_io_setup
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 9/9] security/keys: remove compat_keyctl_instantiate_key_iov
  2020-09-23  6:05 let import_iovec deal with compat_iovecs as well v3 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2020-09-23  6:05 ` [PATCH 8/9] mm: remove compat_process_vm_{readv,writev} Christoph Hellwig
@ 2020-09-23  6:05 ` Christoph Hellwig
  7 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-23  6:05 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Now that import_iovec handles compat iovecs, the native version of
keyctl_instantiate_key_iov can be used for the compat case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 security/keys/compat.c   | 36 ++----------------------------------
 security/keys/internal.h |  5 -----
 security/keys/keyctl.c   |  2 +-
 3 files changed, 3 insertions(+), 40 deletions(-)

diff --git a/security/keys/compat.c b/security/keys/compat.c
index 7ae531db031cf8..1545efdca56227 100644
--- a/security/keys/compat.c
+++ b/security/keys/compat.c
@@ -11,38 +11,6 @@
 #include <linux/slab.h>
 #include "internal.h"
 
-/*
- * Instantiate a key with the specified compatibility multipart payload and
- * link the key into the destination keyring if one is given.
- *
- * The caller must have the appropriate instantiation permit set for this to
- * work (see keyctl_assume_authority).  No other permissions are required.
- *
- * If successful, 0 will be returned.
- */
-static long compat_keyctl_instantiate_key_iov(
-	key_serial_t id,
-	const struct compat_iovec __user *_payload_iov,
-	unsigned ioc,
-	key_serial_t ringid)
-{
-	struct iovec iovstack[UIO_FASTIOV], *iov = iovstack;
-	struct iov_iter from;
-	long ret;
-
-	if (!_payload_iov)
-		ioc = 0;
-
-	ret = import_iovec(WRITE, (const struct iovec __user *)_payload_iov,
-			   ioc, ARRAY_SIZE(iovstack), &iov, &from);
-	if (ret < 0)
-		return ret;
-
-	ret = keyctl_instantiate_key_common(id, &from, ringid);
-	kfree(iov);
-	return ret;
-}
-
 /*
  * The key control system call, 32-bit compatibility version for 64-bit archs
  */
@@ -113,8 +81,8 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option,
 		return keyctl_reject_key(arg2, arg3, arg4, arg5);
 
 	case KEYCTL_INSTANTIATE_IOV:
-		return compat_keyctl_instantiate_key_iov(
-			arg2, compat_ptr(arg3), arg4, arg5);
+		return keyctl_instantiate_key_iov(arg2, compat_ptr(arg3), arg4,
+						  arg5);
 
 	case KEYCTL_INVALIDATE:
 		return keyctl_invalidate_key(arg2);
diff --git a/security/keys/internal.h b/security/keys/internal.h
index 338a526cbfa516..9b9cf3b6fcbb4d 100644
--- a/security/keys/internal.h
+++ b/security/keys/internal.h
@@ -262,11 +262,6 @@ extern long keyctl_instantiate_key_iov(key_serial_t,
 				       const struct iovec __user *,
 				       unsigned, key_serial_t);
 extern long keyctl_invalidate_key(key_serial_t);
-
-struct iov_iter;
-extern long keyctl_instantiate_key_common(key_serial_t,
-					  struct iov_iter *,
-					  key_serial_t);
 extern long keyctl_restrict_keyring(key_serial_t id,
 				    const char __user *_type,
 				    const char __user *_restriction);
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index 9febd37a168fd0..e26bbccda7ccee 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -1164,7 +1164,7 @@ static int keyctl_change_reqkey_auth(struct key *key)
  *
  * If successful, 0 will be returned.
  */
-long keyctl_instantiate_key_common(key_serial_t id,
+static long keyctl_instantiate_key_common(key_serial_t id,
 				   struct iov_iter *from,
 				   key_serial_t ringid)
 {
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-22  9:01                                 ` Arnd Bergmann
  2020-09-22 16:20                                   ` Andy Lutomirski
@ 2020-09-23  8:01                                   ` Pavel Begunkov
  2020-09-23 13:22                                     ` Al Viro
  1 sibling, 1 reply; 171+ messages in thread
From: Pavel Begunkov @ 2020-09-23  8:01 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Andy Lutomirski, Christoph Hellwig, Al Viro, Andrew Morton,
	Jens Axboe, David Howells, linux-arm-kernel, X86 ML, LKML,
	open list:MIPS, Parisc List, linuxppc-dev, linux-s390,
	sparclinux, linux-block, Linux SCSI List, Linux FS Devel,
	linux-aio, io-uring, linux-arch, Linux-MM, Network Development,
	keyrings, LSM List

On 22/09/2020 12:01, Arnd Bergmann wrote:
> On Tue, Sep 22, 2020 at 9:59 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>> On 22/09/2020 10:23, Arnd Bergmann wrote:
>>> On Tue, Sep 22, 2020 at 8:32 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>>> On 22/09/2020 03:58, Andy Lutomirski wrote:
>>>>> On Mon, Sep 21, 2020 at 5:24 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>>>> I may be looking at a different kernel than you, but aren't you
>>>>> preventing creating an io_uring regardless of whether SQPOLL is
>>>>> requested?
>>>>
>>>> I diffed a not-saved file on a sleepy head, thanks for noticing.
>>>> As you said, there should be an SQPOLL check.
>>>>
>>>> ...
>>>> if (ctx->compat && (p->flags & IORING_SETUP_SQPOLL))
>>>>         goto err;
>>>
>>> Wouldn't that mean that now 32-bit containers behave differently
>>> between compat and native execution?
>>>
>>> I think if you want to prevent 32-bit applications from using SQPOLL,
>>> it needs to be done the same way on both to be consistent:
>>
>> The intention was to disable only compat not native 32-bit.
> 
> I'm not following why that would be considered a valid option,
> as that clearly breaks existing users that update from a 32-bit
> kernel to a 64-bit one.

Do you mean users who move 32-bit binaries (without recompiling) to a
new x64 kernel? Does the kernel guarantees that to work? I'd personally
care more native-bitness apps.

> 
> Taking away the features from users that are still on 32-bit kernels
> already seems questionable to me, but being inconsistent
> about it seems much worse, in particular when the regression
> is on the upgrade path.

TBH, this won't fix that entirely (e.g. passing non-compat io_uring
to a compat process should yield the same problem). So, let's put
it aside for now until this bikeshedding would be relevant.

> 
>>> Can we expect all existing and future user space to have a sane
>>> fallback when IORING_SETUP_SQPOLL fails?
>>
>> SQPOLL has a few differences with non-SQPOLL modes, but it's easy
>> to convert between them. Anyway, SQPOLL is a privileged special
>> case that's here for performance/latency reasons, I don't think
>> there will be any non-accidental users of it.
> 
> Ok, so the behavior of 32-bit tasks would be the same as running
> the same application as unprivileged 64-bit tasks, with applications

Yes, something like that, but that's not automatic and in some
(hopefully rare) cases there may be pitfalls. That's in short,
I can expand the idea a bit if anyone would be interested.

> already having to implement that fallback, right?

Well, not everyone _have_ to implement such a fallback, e.g.
applications working only whilst privileged may use SQPOLL only.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
  2020-09-23  8:01                                   ` Pavel Begunkov
@ 2020-09-23 13:22                                     ` Al Viro
  0 siblings, 0 replies; 171+ messages in thread
From: Al Viro @ 2020-09-23 13:22 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Arnd Bergmann, Andy Lutomirski, Christoph Hellwig, Andrew Morton,
	Jens Axboe, David Howells, linux-arm-kernel, X86 ML, LKML,
	open list:MIPS, Parisc List, linuxppc-dev, linux-s390,
	sparclinux, linux-block, Linux SCSI List, Linux FS Devel,
	linux-aio, io-uring, linux-arch, Linux-MM, Network Development,
	keyrings, LSM List

On Wed, Sep 23, 2020 at 11:01:34AM +0300, Pavel Begunkov wrote:

> > I'm not following why that would be considered a valid option,
> > as that clearly breaks existing users that update from a 32-bit
> > kernel to a 64-bit one.
> 
> Do you mean users who move 32-bit binaries (without recompiling) to a
> new x64 kernel? Does the kernel guarantees that to work?

Yes.

No further (printable) comments for now...

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec
  2020-09-23  6:05 ` [PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec Christoph Hellwig
@ 2020-09-23 14:16   ` Al Viro
  2020-09-23 14:38     ` David Laight
  2020-09-23 14:40     ` Al Viro
  0 siblings, 2 replies; 171+ messages in thread
From: Al Viro @ 2020-09-23 14:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module,
	Linus Torvalds

On Wed, Sep 23, 2020 at 08:05:41AM +0200, Christoph Hellwig wrote:

> +struct iovec *iovec_from_user(const struct iovec __user *uvec,
> +		unsigned long nr_segs, unsigned long fast_segs,

Hmm...  For fast_segs unsigned long had always been ridiculous
(4G struct iovec on caller stack frame?), but that got me wondering about
nr_segs and I wish I'd thought of that when introducing import_iovec().

The thing is, import_iovec() takes unsigned int there.  Which is fine
(hell, the maximal value that can be accepted in 1024), except that
we do pass unsigned long syscall argument to it in some places.

E.g. vfs_readv() quietly truncates vlen to 32 bits, and vlen can
come unchanged through sys_readv() -> do_readv() -> vfs_readv().
With unsigned long passed by syscall glue.

AFAICS, passing 4G+1 as the third argument to readv(2) on 64bit box
will be quietly treated as 1 these days.  Which would be fine, except
that before "switch {compat_,}do_readv_writev() to {compat_,}import_iovec()"
it used to fail with -EINVAL.

Userland, BTW, describes readv(2) iovcnt as int; process_vm_readv(),
OTOH, has these counts unsigned long from the userland POV...

I suppose we ought to switch import_iovec() to unsigned long for nr_segs ;-/
Strictly speaking that had been a userland ABI change, even though nothing
except regression tests checking for expected errors would've been likely
to notice.  And it looks like no regression tests covered that one...

Linus, does that qualify for your "if no userland has noticed the change,
it's not a breakage"?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec
  2020-09-23 14:16   ` Al Viro
@ 2020-09-23 14:38     ` David Laight
  2020-09-23 14:49       ` Al Viro
  2020-09-23 14:40     ` Al Viro
  1 sibling, 1 reply; 171+ messages in thread
From: David Laight @ 2020-09-23 14:38 UTC (permalink / raw)
  To: 'Al Viro', Christoph Hellwig
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module, Linus Torvalds

From: Al Viro
> Sent: 23 September 2020 15:17
> 
> On Wed, Sep 23, 2020 at 08:05:41AM +0200, Christoph Hellwig wrote:
> 
> > +struct iovec *iovec_from_user(const struct iovec __user *uvec,
> > +		unsigned long nr_segs, unsigned long fast_segs,
> 
> Hmm...  For fast_segs unsigned long had always been ridiculous
> (4G struct iovec on caller stack frame?), but that got me wondering about
> nr_segs and I wish I'd thought of that when introducing import_iovec().
> 
> The thing is, import_iovec() takes unsigned int there.  Which is fine
> (hell, the maximal value that can be accepted in 1024), except that
> we do pass unsigned long syscall argument to it in some places.

It will make diddly-squit difference.
The parameters end up in registers on most calling conventions.
Plausibly you get an extra 'REX' byte on x86 for the 64bit value.
What you want to avoid is explicit sign/zero extension and value
masking after arithmetic.

On x86-64 the 'horrid' type is actually 'signed int'.
It often needs sign extending to 64bits (eg when being
used as an array subscript).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec
  2020-09-23 14:16   ` Al Viro
  2020-09-23 14:38     ` David Laight
@ 2020-09-23 14:40     ` Al Viro
  1 sibling, 0 replies; 171+ messages in thread
From: Al Viro @ 2020-09-23 14:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module,
	Linus Torvalds

On Wed, Sep 23, 2020 at 03:16:54PM +0100, Al Viro wrote:
> On Wed, Sep 23, 2020 at 08:05:41AM +0200, Christoph Hellwig wrote:
> 
> > +struct iovec *iovec_from_user(const struct iovec __user *uvec,
> > +		unsigned long nr_segs, unsigned long fast_segs,
> 
> Hmm...  For fast_segs unsigned long had always been ridiculous
> (4G struct iovec on caller stack frame?), but that got me wondering about
> nr_segs and I wish I'd thought of that when introducing import_iovec().
> 
> The thing is, import_iovec() takes unsigned int there.  Which is fine
> (hell, the maximal value that can be accepted in 1024), except that
> we do pass unsigned long syscall argument to it in some places.
> 
> E.g. vfs_readv() quietly truncates vlen to 32 bits, and vlen can
> come unchanged through sys_readv() -> do_readv() -> vfs_readv().
> With unsigned long passed by syscall glue.
> 
> AFAICS, passing 4G+1 as the third argument to readv(2) on 64bit box
> will be quietly treated as 1 these days.  Which would be fine, except
> that before "switch {compat_,}do_readv_writev() to {compat_,}import_iovec()"
> it used to fail with -EINVAL.
> 
> Userland, BTW, describes readv(2) iovcnt as int; process_vm_readv(),
> OTOH, has these counts unsigned long from the userland POV...
> 
> I suppose we ought to switch import_iovec() to unsigned long for nr_segs ;-/
> Strictly speaking that had been a userland ABI change, even though nothing
> except regression tests checking for expected errors would've been likely
> to notice.  And it looks like no regression tests covered that one...
> 
> Linus, does that qualify for your "if no userland has noticed the change,
> it's not a breakage"?

Egads...  We have sys_readv() with unsigned long for file descriptor, since
1.3.31 when it had been introduced.  And originally it did comparison with
NR_OPEN right in sys_readv().  Then in 2.1.60 it had been switched to
fget(), which used to take unsigned long at that point.  And in 2.1.90pre1
it went unsigned int, so non-zero upper 32 bits in readv(2) first argument
ceased to cause EBADF...

Of course, libc had it as int fd all along.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec
  2020-09-23 14:38     ` David Laight
@ 2020-09-23 14:49       ` Al Viro
  0 siblings, 0 replies; 171+ messages in thread
From: Al Viro @ 2020-09-23 14:49 UTC (permalink / raw)
  To: David Laight
  Cc: Christoph Hellwig, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module,
	Linus Torvalds

On Wed, Sep 23, 2020 at 02:38:24PM +0000, David Laight wrote:
> From: Al Viro
> > Sent: 23 September 2020 15:17
> > 
> > On Wed, Sep 23, 2020 at 08:05:41AM +0200, Christoph Hellwig wrote:
> > 
> > > +struct iovec *iovec_from_user(const struct iovec __user *uvec,
> > > +		unsigned long nr_segs, unsigned long fast_segs,
> > 
> > Hmm...  For fast_segs unsigned long had always been ridiculous
> > (4G struct iovec on caller stack frame?), but that got me wondering about
> > nr_segs and I wish I'd thought of that when introducing import_iovec().
> > 
> > The thing is, import_iovec() takes unsigned int there.  Which is fine
> > (hell, the maximal value that can be accepted in 1024), except that
> > we do pass unsigned long syscall argument to it in some places.
> 
> It will make diddly-squit difference.
> The parameters end up in registers on most calling conventions.
> Plausibly you get an extra 'REX' byte on x86 for the 64bit value.
> What you want to avoid is explicit sign/zero extension and value
> masking after arithmetic.

Don't tell me what I want; your telepathic abilities are consistently sucky.

I am *NOT* talking about microoptimization here.  I have described
the behaviour change of syscall caused by commit 5 years ago.  Which is
generally considered a problem.  Then I asked whether that behaviour
change would fall under the "if nobody noticed, it's not a userland ABI
breakage" exception.

Could you show me the point where I have expressed concerns about
the quality of amd64 code generated for that thing, before or after
the change in question?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* let import_iovec deal with compat_iovecs as well v4
  2020-09-18 12:45 let import_iovec deal with compat_iovecs as well Christoph Hellwig
                   ` (10 preceding siblings ...)
  2020-09-21 14:34 ` let import_iovec deal with compat_iovecs as well v2 Christoph Hellwig
@ 2020-09-25  4:51 ` Christoph Hellwig
  2020-09-25  4:51   ` [PATCH 1/9] compat.h: fix a spelling error in <linux/compat.h> Christoph Hellwig
                     ` (9 more replies)
  11 siblings, 10 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-25  4:51 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Hi Al,

this series changes import_iovec to transparently deal with compat iovec
structures, and then cleanups up a lot of code dupliation.

Changes since v3:
 - fix up changed prototypes in compat.h as well

Changes since v2:
 - revert the switch of the access process vm sysclls to iov_iter
 - refactor the import_iovec internals differently
 - switch aio to use __import_iovec

Changes since v1:
 - improve a commit message
 - drop a pointless unlikely
 - drop the PF_FORCE_COMPAT flag
 - add a few more cleanups (including two from David Laight)

Diffstat:
 arch/arm64/include/asm/unistd32.h                  |   10 
 arch/mips/kernel/syscalls/syscall_n32.tbl          |   10 
 arch/mips/kernel/syscalls/syscall_o32.tbl          |   10 
 arch/parisc/kernel/syscalls/syscall.tbl            |   10 
 arch/powerpc/kernel/syscalls/syscall.tbl           |   10 
 arch/s390/kernel/syscalls/syscall.tbl              |   10 
 arch/sparc/kernel/syscalls/syscall.tbl             |   10 
 arch/x86/entry/syscall_x32.c                       |    5 
 arch/x86/entry/syscalls/syscall_32.tbl             |   10 
 arch/x86/entry/syscalls/syscall_64.tbl             |   10 
 block/scsi_ioctl.c                                 |   12 
 drivers/scsi/sg.c                                  |    9 
 fs/aio.c                                           |   38 --
 fs/io_uring.c                                      |   20 -
 fs/read_write.c                                    |  362 +--------------------
 fs/splice.c                                        |   57 ---
 include/linux/compat.h                             |   24 -
 include/linux/fs.h                                 |   11 
 include/linux/uio.h                                |   10 
 include/uapi/asm-generic/unistd.h                  |   12 
 lib/iov_iter.c                                     |  161 +++++++--
 mm/process_vm_access.c                             |   85 ----
 net/compat.c                                       |    4 
 security/keys/compat.c                             |   37 --
 security/keys/internal.h                           |    5 
 security/keys/keyctl.c                             |    2 
 tools/include/uapi/asm-generic/unistd.h            |   12 
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl |   10 
 tools/perf/arch/s390/entry/syscalls/syscall.tbl    |   10 
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  |   10 
 30 files changed, 280 insertions(+), 706 deletions(-)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 1/9] compat.h: fix a spelling error in <linux/compat.h>
  2020-09-25  4:51 ` let import_iovec deal with compat_iovecs as well v4 Christoph Hellwig
@ 2020-09-25  4:51   ` Christoph Hellwig
  2020-09-25  4:51   ` [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c Christoph Hellwig
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-25  4:51 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

There is no compat_sys_readv64v2 syscall, only a compat_sys_preadv64v2
one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/compat.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index b354ce58966e2d..654c1ec36671a4 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -812,7 +812,7 @@ asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
 		const struct compat_iovec __user *vec,
 		compat_ulong_t vlen, u32 pos_low, u32 pos_high, rwf_t flags);
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
-asmlinkage long  compat_sys_readv64v2(unsigned long fd,
+asmlinkage long  compat_sys_preadv64v2(unsigned long fd,
 		const struct compat_iovec __user *vec,
 		unsigned long vlen, loff_t pos, rwf_t flags);
 #endif
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c
  2020-09-25  4:51 ` let import_iovec deal with compat_iovecs as well v4 Christoph Hellwig
  2020-09-25  4:51   ` [PATCH 1/9] compat.h: fix a spelling error in <linux/compat.h> Christoph Hellwig
@ 2020-09-25  4:51   ` Christoph Hellwig
  2020-10-21 16:13     ` Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c" Greg KH
  2020-09-25  4:51   ` [PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec Christoph Hellwig
                     ` (7 subsequent siblings)
  9 siblings, 1 reply; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-25  4:51 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module, David Laight,
	David Laight

From: David Laight <David.Laight@ACULAB.COM>

This lets the compiler inline it into import_iovec() generating
much better code.

Signed-off-by: David Laight <david.laight@aculab.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 179 ------------------------------------------------
 lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 176 insertions(+), 179 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 5db58b8c78d0dd..e5e891a88442ef 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -752,185 +752,6 @@ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
 	return ret;
 }
 
-/**
- * rw_copy_check_uvector() - Copy an array of &struct iovec from userspace
- *     into the kernel and check that it is valid.
- *
- * @type: One of %CHECK_IOVEC_ONLY, %READ, or %WRITE.
- * @uvector: Pointer to the userspace array.
- * @nr_segs: Number of elements in userspace array.
- * @fast_segs: Number of elements in @fast_pointer.
- * @fast_pointer: Pointer to (usually small on-stack) kernel array.
- * @ret_pointer: (output parameter) Pointer to a variable that will point to
- *     either @fast_pointer, a newly allocated kernel array, or NULL,
- *     depending on which array was used.
- *
- * This function copies an array of &struct iovec of @nr_segs from
- * userspace into the kernel and checks that each element is valid (e.g.
- * it does not point to a kernel address or cause overflow by being too
- * large, etc.).
- *
- * As an optimization, the caller may provide a pointer to a small
- * on-stack array in @fast_pointer, typically %UIO_FASTIOV elements long
- * (the size of this array, or 0 if unused, should be given in @fast_segs).
- *
- * @ret_pointer will always point to the array that was used, so the
- * caller must take care not to call kfree() on it e.g. in case the
- * @fast_pointer array was used and it was allocated on the stack.
- *
- * Return: The total number of bytes covered by the iovec array on success
- *   or a negative error code on error.
- */
-ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
-			      unsigned long nr_segs, unsigned long fast_segs,
-			      struct iovec *fast_pointer,
-			      struct iovec **ret_pointer)
-{
-	unsigned long seg;
-	ssize_t ret;
-	struct iovec *iov = fast_pointer;
-
-	/*
-	 * SuS says "The readv() function *may* fail if the iovcnt argument
-	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-	 * traditionally returned zero for zero segments, so...
-	 */
-	if (nr_segs = 0) {
-		ret = 0;
-		goto out;
-	}
-
-	/*
-	 * First get the "struct iovec" from user memory and
-	 * verify all the pointers
-	 */
-	if (nr_segs > UIO_MAXIOV) {
-		ret = -EINVAL;
-		goto out;
-	}
-	if (nr_segs > fast_segs) {
-		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-		if (iov = NULL) {
-			ret = -ENOMEM;
-			goto out;
-		}
-	}
-	if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
-		ret = -EFAULT;
-		goto out;
-	}
-
-	/*
-	 * According to the Single Unix Specification we should return EINVAL
-	 * if an element length is < 0 when cast to ssize_t or if the
-	 * total length would overflow the ssize_t return value of the
-	 * system call.
-	 *
-	 * Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
-	 * overflow case.
-	 */
-	ret = 0;
-	for (seg = 0; seg < nr_segs; seg++) {
-		void __user *buf = iov[seg].iov_base;
-		ssize_t len = (ssize_t)iov[seg].iov_len;
-
-		/* see if we we're about to use an invalid len or if
-		 * it's about to overflow ssize_t */
-		if (len < 0) {
-			ret = -EINVAL;
-			goto out;
-		}
-		if (type >= 0
-		    && unlikely(!access_ok(buf, len))) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len > MAX_RW_COUNT - ret) {
-			len = MAX_RW_COUNT - ret;
-			iov[seg].iov_len = len;
-		}
-		ret += len;
-	}
-out:
-	*ret_pointer = iov;
-	return ret;
-}
-
-#ifdef CONFIG_COMPAT
-ssize_t compat_rw_copy_check_uvector(int type,
-		const struct compat_iovec __user *uvector, unsigned long nr_segs,
-		unsigned long fast_segs, struct iovec *fast_pointer,
-		struct iovec **ret_pointer)
-{
-	compat_ssize_t tot_len;
-	struct iovec *iov = *ret_pointer = fast_pointer;
-	ssize_t ret = 0;
-	int seg;
-
-	/*
-	 * SuS says "The readv() function *may* fail if the iovcnt argument
-	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-	 * traditionally returned zero for zero segments, so...
-	 */
-	if (nr_segs = 0)
-		goto out;
-
-	ret = -EINVAL;
-	if (nr_segs > UIO_MAXIOV)
-		goto out;
-	if (nr_segs > fast_segs) {
-		ret = -ENOMEM;
-		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-		if (iov = NULL)
-			goto out;
-	}
-	*ret_pointer = iov;
-
-	ret = -EFAULT;
-	if (!access_ok(uvector, nr_segs*sizeof(*uvector)))
-		goto out;
-
-	/*
-	 * Single unix specification:
-	 * We should -EINVAL if an element length is not >= 0 and fitting an
-	 * ssize_t.
-	 *
-	 * In Linux, the total length is limited to MAX_RW_COUNT, there is
-	 * no overflow possibility.
-	 */
-	tot_len = 0;
-	ret = -EINVAL;
-	for (seg = 0; seg < nr_segs; seg++) {
-		compat_uptr_t buf;
-		compat_ssize_t len;
-
-		if (__get_user(len, &uvector->iov_len) ||
-		   __get_user(buf, &uvector->iov_base)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len < 0)	/* size_t not fitting in compat_ssize_t .. */
-			goto out;
-		if (type >= 0 &&
-		    !access_ok(compat_ptr(buf), len)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len > MAX_RW_COUNT - tot_len)
-			len = MAX_RW_COUNT - tot_len;
-		tot_len += len;
-		iov->iov_base = compat_ptr(buf);
-		iov->iov_len = (compat_size_t) len;
-		uvector++;
-		iov++;
-	}
-	ret = tot_len;
-
-out:
-	return ret;
-}
-#endif
-
 static ssize_t do_iter_read(struct file *file, struct iov_iter *iter,
 		loff_t *pos, rwf_t flags)
 {
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 5e40786c8f1232..ccea9db3f72be8 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1650,6 +1650,109 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
 }
 EXPORT_SYMBOL(dup_iter);
 
+/**
+ * rw_copy_check_uvector() - Copy an array of &struct iovec from userspace
+ *     into the kernel and check that it is valid.
+ *
+ * @type: One of %CHECK_IOVEC_ONLY, %READ, or %WRITE.
+ * @uvector: Pointer to the userspace array.
+ * @nr_segs: Number of elements in userspace array.
+ * @fast_segs: Number of elements in @fast_pointer.
+ * @fast_pointer: Pointer to (usually small on-stack) kernel array.
+ * @ret_pointer: (output parameter) Pointer to a variable that will point to
+ *     either @fast_pointer, a newly allocated kernel array, or NULL,
+ *     depending on which array was used.
+ *
+ * This function copies an array of &struct iovec of @nr_segs from
+ * userspace into the kernel and checks that each element is valid (e.g.
+ * it does not point to a kernel address or cause overflow by being too
+ * large, etc.).
+ *
+ * As an optimization, the caller may provide a pointer to a small
+ * on-stack array in @fast_pointer, typically %UIO_FASTIOV elements long
+ * (the size of this array, or 0 if unused, should be given in @fast_segs).
+ *
+ * @ret_pointer will always point to the array that was used, so the
+ * caller must take care not to call kfree() on it e.g. in case the
+ * @fast_pointer array was used and it was allocated on the stack.
+ *
+ * Return: The total number of bytes covered by the iovec array on success
+ *   or a negative error code on error.
+ */
+ssize_t rw_copy_check_uvector(int type, const struct iovec __user *uvector,
+		unsigned long nr_segs, unsigned long fast_segs,
+		struct iovec *fast_pointer, struct iovec **ret_pointer)
+{
+	unsigned long seg;
+	ssize_t ret;
+	struct iovec *iov = fast_pointer;
+
+	/*
+	 * SuS says "The readv() function *may* fail if the iovcnt argument
+	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
+	 * traditionally returned zero for zero segments, so...
+	 */
+	if (nr_segs = 0) {
+		ret = 0;
+		goto out;
+	}
+
+	/*
+	 * First get the "struct iovec" from user memory and
+	 * verify all the pointers
+	 */
+	if (nr_segs > UIO_MAXIOV) {
+		ret = -EINVAL;
+		goto out;
+	}
+	if (nr_segs > fast_segs) {
+		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
+		if (iov = NULL) {
+			ret = -ENOMEM;
+			goto out;
+		}
+	}
+	if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	/*
+	 * According to the Single Unix Specification we should return EINVAL
+	 * if an element length is < 0 when cast to ssize_t or if the
+	 * total length would overflow the ssize_t return value of the
+	 * system call.
+	 *
+	 * Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
+	 * overflow case.
+	 */
+	ret = 0;
+	for (seg = 0; seg < nr_segs; seg++) {
+		void __user *buf = iov[seg].iov_base;
+		ssize_t len = (ssize_t)iov[seg].iov_len;
+
+		/* see if we we're about to use an invalid len or if
+		 * it's about to overflow ssize_t */
+		if (len < 0) {
+			ret = -EINVAL;
+			goto out;
+		}
+		if (type >= 0
+		    && unlikely(!access_ok(buf, len))) {
+			ret = -EFAULT;
+			goto out;
+		}
+		if (len > MAX_RW_COUNT - ret) {
+			len = MAX_RW_COUNT - ret;
+			iov[seg].iov_len = len;
+		}
+		ret += len;
+	}
+out:
+	*ret_pointer = iov;
+	return ret;
+}
+
 /**
  * import_iovec() - Copy an array of &struct iovec from userspace
  *     into the kernel, check that it is valid, and initialize a new
@@ -1695,6 +1798,79 @@ EXPORT_SYMBOL(import_iovec);
 #ifdef CONFIG_COMPAT
 #include <linux/compat.h>
 
+ssize_t compat_rw_copy_check_uvector(int type,
+		const struct compat_iovec __user *uvector,
+		unsigned long nr_segs, unsigned long fast_segs,
+		struct iovec *fast_pointer, struct iovec **ret_pointer)
+{
+	compat_ssize_t tot_len;
+	struct iovec *iov = *ret_pointer = fast_pointer;
+	ssize_t ret = 0;
+	int seg;
+
+	/*
+	 * SuS says "The readv() function *may* fail if the iovcnt argument
+	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
+	 * traditionally returned zero for zero segments, so...
+	 */
+	if (nr_segs = 0)
+		goto out;
+
+	ret = -EINVAL;
+	if (nr_segs > UIO_MAXIOV)
+		goto out;
+	if (nr_segs > fast_segs) {
+		ret = -ENOMEM;
+		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
+		if (iov = NULL)
+			goto out;
+	}
+	*ret_pointer = iov;
+
+	ret = -EFAULT;
+	if (!access_ok(uvector, nr_segs*sizeof(*uvector)))
+		goto out;
+
+	/*
+	 * Single unix specification:
+	 * We should -EINVAL if an element length is not >= 0 and fitting an
+	 * ssize_t.
+	 *
+	 * In Linux, the total length is limited to MAX_RW_COUNT, there is
+	 * no overflow possibility.
+	 */
+	tot_len = 0;
+	ret = -EINVAL;
+	for (seg = 0; seg < nr_segs; seg++) {
+		compat_uptr_t buf;
+		compat_ssize_t len;
+
+		if (__get_user(len, &uvector->iov_len) ||
+		   __get_user(buf, &uvector->iov_base)) {
+			ret = -EFAULT;
+			goto out;
+		}
+		if (len < 0)	/* size_t not fitting in compat_ssize_t .. */
+			goto out;
+		if (type >= 0 &&
+		    !access_ok(compat_ptr(buf), len)) {
+			ret = -EFAULT;
+			goto out;
+		}
+		if (len > MAX_RW_COUNT - tot_len)
+			len = MAX_RW_COUNT - tot_len;
+		tot_len += len;
+		iov->iov_base = compat_ptr(buf);
+		iov->iov_len = (compat_size_t) len;
+		uvector++;
+		iov++;
+	}
+	ret = tot_len;
+
+out:
+	return ret;
+}
+
 ssize_t compat_import_iovec(int type,
 		const struct compat_iovec __user * uvector,
 		unsigned nr_segs, unsigned fast_segs,
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec
  2020-09-25  4:51 ` let import_iovec deal with compat_iovecs as well v4 Christoph Hellwig
  2020-09-25  4:51   ` [PATCH 1/9] compat.h: fix a spelling error in <linux/compat.h> Christoph Hellwig
  2020-09-25  4:51   ` [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c Christoph Hellwig
@ 2020-09-25  4:51   ` Christoph Hellwig
  2020-09-25  4:51   ` [PATCH 4/9] iov_iter: transparently handle compat iovecs in import_iovec Christoph Hellwig
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-25  4:51 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Split rw_copy_check_uvector into two new helpers with more sensible
calling conventions:

 - iovec_from_user copies a iovec from userspace either into the provided
   stack buffer if it fits, or allocates a new buffer for it.  Returns
   the actually used iovec.  It also verifies that iov_len does fit a
   signed type, and handles compat iovecs if the compat flag is set.
 - __import_iovec consolidates the native and compat versions of
   import_iovec. It calls iovec_from_user, then validates each iovec
   actually points to user addresses, and ensures the total length
   doesn't overflow.

This has two major implications:

 - the access_process_vm case loses the total lenght checking, which
   wasn't required anyway, given that each call receives two iovecs
   for the local and remote side of the operation, and it verifies
   the total length on the local side already.
 - instead of a single loop there now are two loops over the iovecs.
   Given that the iovecs are cache hot this doesn't make a major
   difference

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/compat.h |   6 -
 include/linux/fs.h     |  13 --
 include/linux/uio.h    |  12 +-
 lib/iov_iter.c         | 300 ++++++++++++++++-------------------------
 mm/process_vm_access.c |  34 +++--
 5 files changed, 138 insertions(+), 227 deletions(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index 654c1ec36671a4..b930de791ff16b 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -451,12 +451,6 @@ extern long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
 
 struct epoll_event;	/* fortunately, this one is fixed-layout */
 
-extern ssize_t compat_rw_copy_check_uvector(int type,
-		const struct compat_iovec __user *uvector,
-		unsigned long nr_segs,
-		unsigned long fast_segs, struct iovec *fast_pointer,
-		struct iovec **ret_pointer);
-
 extern void __user *compat_alloc_user_space(unsigned long len);
 
 int compat_restore_altstack(const compat_stack_t __user *uss);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7519ae003a082c..e69b45b6cc7b5f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -178,14 +178,6 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 /* File supports async buffered reads */
 #define FMODE_BUF_RASYNC	((__force fmode_t)0x40000000)
 
-/*
- * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
- * that indicates that they should check the contents of the iovec are
- * valid, but not check the memory that the iovec elements
- * points too.
- */
-#define CHECK_IOVEC_ONLY -1
-
 /*
  * Attribute flags.  These should be or-ed together to figure out what
  * has been changed!
@@ -1887,11 +1879,6 @@ static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
 	return file->f_op->mmap(file, vma);
 }
 
-ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
-			      unsigned long nr_segs, unsigned long fast_segs,
-			      struct iovec *fast_pointer,
-			      struct iovec **ret_pointer);
-
 extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
 extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 3835a8a8e9eae0..92c11fe41c6228 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -266,9 +266,15 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, struct
 size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
 		struct iov_iter *i);
 
-ssize_t import_iovec(int type, const struct iovec __user * uvector,
-		 unsigned nr_segs, unsigned fast_segs,
-		 struct iovec **iov, struct iov_iter *i);
+struct iovec *iovec_from_user(const struct iovec __user *uvector,
+		unsigned long nr_segs, unsigned long fast_segs,
+		struct iovec *fast_iov, bool compat);
+ssize_t import_iovec(int type, const struct iovec __user *uvec,
+		 unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
+		 struct iov_iter *i);
+ssize_t __import_iovec(int type, const struct iovec __user *uvec,
+		 unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
+		 struct iov_iter *i, bool compat);
 
 #ifdef CONFIG_COMPAT
 struct compat_iovec;
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index ccea9db3f72be8..d5d8afe31fca16 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -7,6 +7,7 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/splice.h>
+#include <linux/compat.h>
 #include <net/checksum.h>
 #include <linux/scatterlist.h>
 #include <linux/instrumented.h>
@@ -1650,107 +1651,133 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
 }
 EXPORT_SYMBOL(dup_iter);
 
-/**
- * rw_copy_check_uvector() - Copy an array of &struct iovec from userspace
- *     into the kernel and check that it is valid.
- *
- * @type: One of %CHECK_IOVEC_ONLY, %READ, or %WRITE.
- * @uvector: Pointer to the userspace array.
- * @nr_segs: Number of elements in userspace array.
- * @fast_segs: Number of elements in @fast_pointer.
- * @fast_pointer: Pointer to (usually small on-stack) kernel array.
- * @ret_pointer: (output parameter) Pointer to a variable that will point to
- *     either @fast_pointer, a newly allocated kernel array, or NULL,
- *     depending on which array was used.
- *
- * This function copies an array of &struct iovec of @nr_segs from
- * userspace into the kernel and checks that each element is valid (e.g.
- * it does not point to a kernel address or cause overflow by being too
- * large, etc.).
- *
- * As an optimization, the caller may provide a pointer to a small
- * on-stack array in @fast_pointer, typically %UIO_FASTIOV elements long
- * (the size of this array, or 0 if unused, should be given in @fast_segs).
- *
- * @ret_pointer will always point to the array that was used, so the
- * caller must take care not to call kfree() on it e.g. in case the
- * @fast_pointer array was used and it was allocated on the stack.
- *
- * Return: The total number of bytes covered by the iovec array on success
- *   or a negative error code on error.
- */
-ssize_t rw_copy_check_uvector(int type, const struct iovec __user *uvector,
-		unsigned long nr_segs, unsigned long fast_segs,
-		struct iovec *fast_pointer, struct iovec **ret_pointer)
+static int copy_compat_iovec_from_user(struct iovec *iov,
+		const struct iovec __user *uvec, unsigned long nr_segs)
+{
+	const struct compat_iovec __user *uiov +		(const struct compat_iovec __user *)uvec;
+	int ret = -EFAULT, i;
+
+	if (!user_access_begin(uvec, nr_segs * sizeof(*uvec)))
+		return -EFAULT;
+
+	for (i = 0; i < nr_segs; i++) {
+		compat_uptr_t buf;
+		compat_ssize_t len;
+
+		unsafe_get_user(len, &uiov[i].iov_len, uaccess_end);
+		unsafe_get_user(buf, &uiov[i].iov_base, uaccess_end);
+
+		/* check for compat_size_t not fitting in compat_ssize_t .. */
+		if (len < 0) {
+			ret = -EINVAL;
+			goto uaccess_end;
+		}
+		iov[i].iov_base = compat_ptr(buf);
+		iov[i].iov_len = len;
+	}
+
+	ret = 0;
+uaccess_end:
+	user_access_end();
+	return ret;
+}
+		
+static int copy_iovec_from_user(struct iovec *iov,
+		const struct iovec __user *uvec, unsigned long nr_segs)
 {
 	unsigned long seg;
-	ssize_t ret;
-	struct iovec *iov = fast_pointer;
 
-	/*
-	 * SuS says "The readv() function *may* fail if the iovcnt argument
-	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-	 * traditionally returned zero for zero segments, so...
-	 */
-	if (nr_segs = 0) {
-		ret = 0;
-		goto out;
+	if (copy_from_user(iov, uvec, nr_segs * sizeof(*uvec)))
+		return -EFAULT;
+	for (seg = 0; seg < nr_segs; seg++) {
+		if ((ssize_t)iov[seg].iov_len < 0)
+			return -EINVAL;
 	}
 
+	return 0;
+}
+
+struct iovec *iovec_from_user(const struct iovec __user *uvec,
+		unsigned long nr_segs, unsigned long fast_segs,
+		struct iovec *fast_iov, bool compat)
+{
+	struct iovec *iov = fast_iov;
+	int ret;
+
 	/*
-	 * First get the "struct iovec" from user memory and
-	 * verify all the pointers
+	 * SuS says "The readv() function *may* fail if the iovcnt argument was
+	 * less than or equal to 0, or greater than {IOV_MAX}.  Linux has
+	 * traditionally returned zero for zero segments, so...
 	 */
-	if (nr_segs > UIO_MAXIOV) {
-		ret = -EINVAL;
-		goto out;
-	}
+	if (nr_segs = 0)
+		return iov;
+	if (nr_segs > UIO_MAXIOV)
+		return ERR_PTR(-EINVAL);
 	if (nr_segs > fast_segs) {
 		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-		if (iov = NULL) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		if (!iov)
+			return ERR_PTR(-ENOMEM);
 	}
-	if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
-		ret = -EFAULT;
-		goto out;
+
+	if (compat)
+		ret = copy_compat_iovec_from_user(iov, uvec, nr_segs);
+	else
+		ret = copy_iovec_from_user(iov, uvec, nr_segs);
+	if (ret) {
+		if (iov != fast_iov)
+			kfree(iov);
+		return ERR_PTR(ret);
+	}
+
+	return iov;
+}
+
+ssize_t __import_iovec(int type, const struct iovec __user *uvec,
+		 unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
+		 struct iov_iter *i, bool compat)
+{
+	ssize_t total_len = 0;
+	unsigned long seg;
+	struct iovec *iov;
+
+	iov = iovec_from_user(uvec, nr_segs, fast_segs, *iovp, compat);
+	if (IS_ERR(iov)) {
+		*iovp = NULL;
+		return PTR_ERR(iov);
 	}
 
 	/*
-	 * According to the Single Unix Specification we should return EINVAL
-	 * if an element length is < 0 when cast to ssize_t or if the
-	 * total length would overflow the ssize_t return value of the
-	 * system call.
+	 * According to the Single Unix Specification we should return EINVAL if
+	 * an element length is < 0 when cast to ssize_t or if the total length
+	 * would overflow the ssize_t return value of the system call.
 	 *
 	 * Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
 	 * overflow case.
 	 */
-	ret = 0;
 	for (seg = 0; seg < nr_segs; seg++) {
-		void __user *buf = iov[seg].iov_base;
 		ssize_t len = (ssize_t)iov[seg].iov_len;
 
-		/* see if we we're about to use an invalid len or if
-		 * it's about to overflow ssize_t */
-		if (len < 0) {
-			ret = -EINVAL;
-			goto out;
+		if (!access_ok(iov[seg].iov_base, len)) {
+			if (iov != *iovp)
+				kfree(iov);
+			*iovp = NULL;
+			return -EFAULT;
 		}
-		if (type >= 0
-		    && unlikely(!access_ok(buf, len))) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len > MAX_RW_COUNT - ret) {
-			len = MAX_RW_COUNT - ret;
+
+		if (len > MAX_RW_COUNT - total_len) {
+			len = MAX_RW_COUNT - total_len;
 			iov[seg].iov_len = len;
 		}
-		ret += len;
+		total_len += len;
 	}
-out:
-	*ret_pointer = iov;
-	return ret;
+
+	iov_iter_init(i, type, iov, nr_segs, total_len);
+	if (iov = *iovp)
+		*iovp = NULL;
+	else
+		*iovp = iov;
+	return total_len;
 }
 
 /**
@@ -1759,10 +1786,10 @@ ssize_t rw_copy_check_uvector(int type, const struct iovec __user *uvector,
  *     &struct iov_iter iterator to access it.
  *
  * @type: One of %READ or %WRITE.
- * @uvector: Pointer to the userspace array.
+ * @uvec: Pointer to the userspace array.
  * @nr_segs: Number of elements in userspace array.
  * @fast_segs: Number of elements in @iov.
- * @iov: (input and output parameter) Pointer to pointer to (usually small
+ * @iovp: (input and output parameter) Pointer to pointer to (usually small
  *     on-stack) kernel array.
  * @i: Pointer to iterator that will be initialized on success.
  *
@@ -1775,120 +1802,21 @@ ssize_t rw_copy_check_uvector(int type, const struct iovec __user *uvector,
  *
  * Return: Negative error code on error, bytes imported on success
  */
-ssize_t import_iovec(int type, const struct iovec __user * uvector,
+ssize_t import_iovec(int type, const struct iovec __user *uvec,
 		 unsigned nr_segs, unsigned fast_segs,
-		 struct iovec **iov, struct iov_iter *i)
+		 struct iovec **iovp, struct iov_iter *i)
 {
-	ssize_t n;
-	struct iovec *p;
-	n = rw_copy_check_uvector(type, uvector, nr_segs, fast_segs,
-				  *iov, &p);
-	if (n < 0) {
-		if (p != *iov)
-			kfree(p);
-		*iov = NULL;
-		return n;
-	}
-	iov_iter_init(i, type, p, nr_segs, n);
-	*iov = p = *iov ? NULL : p;
-	return n;
+	return __import_iovec(type, uvec, nr_segs, fast_segs, iovp, i, false);
 }
 EXPORT_SYMBOL(import_iovec);
 
 #ifdef CONFIG_COMPAT
-#include <linux/compat.h>
-
-ssize_t compat_rw_copy_check_uvector(int type,
-		const struct compat_iovec __user *uvector,
-		unsigned long nr_segs, unsigned long fast_segs,
-		struct iovec *fast_pointer, struct iovec **ret_pointer)
-{
-	compat_ssize_t tot_len;
-	struct iovec *iov = *ret_pointer = fast_pointer;
-	ssize_t ret = 0;
-	int seg;
-
-	/*
-	 * SuS says "The readv() function *may* fail if the iovcnt argument
-	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-	 * traditionally returned zero for zero segments, so...
-	 */
-	if (nr_segs = 0)
-		goto out;
-
-	ret = -EINVAL;
-	if (nr_segs > UIO_MAXIOV)
-		goto out;
-	if (nr_segs > fast_segs) {
-		ret = -ENOMEM;
-		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-		if (iov = NULL)
-			goto out;
-	}
-	*ret_pointer = iov;
-
-	ret = -EFAULT;
-	if (!access_ok(uvector, nr_segs*sizeof(*uvector)))
-		goto out;
-
-	/*
-	 * Single unix specification:
-	 * We should -EINVAL if an element length is not >= 0 and fitting an
-	 * ssize_t.
-	 *
-	 * In Linux, the total length is limited to MAX_RW_COUNT, there is
-	 * no overflow possibility.
-	 */
-	tot_len = 0;
-	ret = -EINVAL;
-	for (seg = 0; seg < nr_segs; seg++) {
-		compat_uptr_t buf;
-		compat_ssize_t len;
-
-		if (__get_user(len, &uvector->iov_len) ||
-		   __get_user(buf, &uvector->iov_base)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len < 0)	/* size_t not fitting in compat_ssize_t .. */
-			goto out;
-		if (type >= 0 &&
-		    !access_ok(compat_ptr(buf), len)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len > MAX_RW_COUNT - tot_len)
-			len = MAX_RW_COUNT - tot_len;
-		tot_len += len;
-		iov->iov_base = compat_ptr(buf);
-		iov->iov_len = (compat_size_t) len;
-		uvector++;
-		iov++;
-	}
-	ret = tot_len;
-
-out:
-	return ret;
-}
-
-ssize_t compat_import_iovec(int type,
-		const struct compat_iovec __user * uvector,
-		unsigned nr_segs, unsigned fast_segs,
-		struct iovec **iov, struct iov_iter *i)
+ssize_t compat_import_iovec(int type, const struct compat_iovec __user *uvec,
+		unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
+		struct iov_iter *i)
 {
-	ssize_t n;
-	struct iovec *p;
-	n = compat_rw_copy_check_uvector(type, uvector, nr_segs, fast_segs,
-				  *iov, &p);
-	if (n < 0) {
-		if (p != *iov)
-			kfree(p);
-		*iov = NULL;
-		return n;
-	}
-	iov_iter_init(i, type, p, nr_segs, n);
-	*iov = p = *iov ? NULL : p;
-	return n;
+	return __import_iovec(type, (const struct iovec __user *)uvec, nr_segs,
+			     fast_segs, iovp, i, true);
 }
 EXPORT_SYMBOL(compat_import_iovec);
 #endif
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 29c052099affdc..5e728c20c2bead 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -276,20 +276,17 @@ static ssize_t process_vm_rw(pid_t pid,
 	if (rc < 0)
 		return rc;
 	if (!iov_iter_count(&iter))
-		goto free_iovecs;
-
-	rc = rw_copy_check_uvector(CHECK_IOVEC_ONLY, rvec, riovcnt, UIO_FASTIOV,
-				   iovstack_r, &iov_r);
-	if (rc <= 0)
-		goto free_iovecs;
-
+		goto free_iov_l;
+	iov_r = iovec_from_user(rvec, riovcnt, UIO_FASTIOV, iovstack_r, false);
+	if (IS_ERR(iov_r)) {
+		rc = PTR_ERR(iov_r);
+		goto free_iov_l;
+	}
 	rc = process_vm_rw_core(pid, &iter, iov_r, riovcnt, flags, vm_write);
-
-free_iovecs:
 	if (iov_r != iovstack_r)
 		kfree(iov_r);
+free_iov_l:
 	kfree(iov_l);
-
 	return rc;
 }
 
@@ -333,18 +330,17 @@ compat_process_vm_rw(compat_pid_t pid,
 	if (rc < 0)
 		return rc;
 	if (!iov_iter_count(&iter))
-		goto free_iovecs;
-	rc = compat_rw_copy_check_uvector(CHECK_IOVEC_ONLY, rvec, riovcnt,
-					  UIO_FASTIOV, iovstack_r,
-					  &iov_r);
-	if (rc <= 0)
-		goto free_iovecs;
-
+		goto free_iov_l;
+	iov_r = iovec_from_user((const struct iovec __user *)rvec, riovcnt,
+				UIO_FASTIOV, iovstack_r, true);
+	if (IS_ERR(iov_r)) {
+		rc = PTR_ERR(iov_r);
+		goto free_iov_l;
+	}
 	rc = process_vm_rw_core(pid, &iter, iov_r, riovcnt, flags, vm_write);
-
-free_iovecs:
 	if (iov_r != iovstack_r)
 		kfree(iov_r);
+free_iov_l:
 	kfree(iov_l);
 	return rc;
 }
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 4/9] iov_iter: transparently handle compat iovecs in import_iovec
  2020-09-25  4:51 ` let import_iovec deal with compat_iovecs as well v4 Christoph Hellwig
                     ` (2 preceding siblings ...)
  2020-09-25  4:51   ` [PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec Christoph Hellwig
@ 2020-09-25  4:51   ` Christoph Hellwig
  2020-09-25  4:51   ` [PATCH 5/9] fs: remove various compat readv/writev helpers Christoph Hellwig
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-25  4:51 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Use in compat_syscall to import either native or the compat iovecs, and
remove the now superflous compat_import_iovec.

This removes the need for special compat logic in most callers, and
the remaining ones can still be simplified by using __import_iovec
with a bool compat parameter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/scsi_ioctl.c     | 12 ++----------
 drivers/scsi/sg.c      |  9 +--------
 fs/aio.c               |  8 ++------
 fs/io_uring.c          | 20 ++++++++------------
 fs/read_write.c        |  6 ++++--
 fs/splice.c            |  2 +-
 include/linux/uio.h    |  8 --------
 lib/iov_iter.c         | 14 ++------------
 mm/process_vm_access.c |  3 ++-
 net/compat.c           |  4 ++--
 security/keys/compat.c |  5 ++---
 11 files changed, 26 insertions(+), 65 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index ef722f04f88a93..e08df86866ee5d 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -333,16 +333,8 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
 		struct iov_iter i;
 		struct iovec *iov = NULL;
 
-#ifdef CONFIG_COMPAT
-		if (in_compat_syscall())
-			ret = compat_import_iovec(rq_data_dir(rq),
-				   hdr->dxferp, hdr->iovec_count,
-				   0, &iov, &i);
-		else
-#endif
-			ret = import_iovec(rq_data_dir(rq),
-				   hdr->dxferp, hdr->iovec_count,
-				   0, &iov, &i);
+		ret = import_iovec(rq_data_dir(rq), hdr->dxferp,
+				   hdr->iovec_count, 0, &iov, &i);
 		if (ret < 0)
 			goto out_free_cdb;
 
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 20472aaaf630a4..bfa8d77322d732 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1820,14 +1820,7 @@ sg_start_req(Sg_request *srp, unsigned char *cmd)
 		struct iovec *iov = NULL;
 		struct iov_iter i;
 
-#ifdef CONFIG_COMPAT
-		if (in_compat_syscall())
-			res = compat_import_iovec(rw, hp->dxferp, iov_count,
-						  0, &iov, &i);
-		else
-#endif
-			res = import_iovec(rw, hp->dxferp, iov_count,
-					   0, &iov, &i);
+		res = import_iovec(rw, hp->dxferp, iov_count, 0, &iov, &i);
 		if (res < 0)
 			return res;
 
diff --git a/fs/aio.c b/fs/aio.c
index d5ec303855669d..c45c20d875388c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1489,12 +1489,8 @@ static ssize_t aio_setup_rw(int rw, const struct iocb *iocb,
 		*iovec = NULL;
 		return ret;
 	}
-#ifdef CONFIG_COMPAT
-	if (compat)
-		return compat_import_iovec(rw, buf, len, UIO_FASTIOV, iovec,
-				iter);
-#endif
-	return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter);
+
+	return __import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter, compat);
 }
 
 static inline void aio_rw_done(struct kiocb *req, ssize_t ret)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8b426aa29668cb..8c27dc28da182a 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2852,13 +2852,8 @@ static ssize_t __io_import_iovec(int rw, struct io_kiocb *req,
 		return ret;
 	}
 
-#ifdef CONFIG_COMPAT
-	if (req->ctx->compat)
-		return compat_import_iovec(rw, buf, sqe_len, UIO_FASTIOV,
-						iovec, iter);
-#endif
-
-	return import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter);
+	return __import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter,
+			      req->ctx->compat);
 }
 
 static ssize_t io_import_iovec(int rw, struct io_kiocb *req,
@@ -4200,8 +4195,9 @@ static int __io_recvmsg_copy_hdr(struct io_kiocb *req,
 				sr->len);
 		iomsg->iov = NULL;
 	} else {
-		ret = import_iovec(READ, uiov, iov_len, UIO_FASTIOV,
-					&iomsg->iov, &iomsg->msg.msg_iter);
+		ret = __import_iovec(READ, uiov, iov_len, UIO_FASTIOV,
+				     &iomsg->iov, &iomsg->msg.msg_iter,
+				     false);
 		if (ret > 0)
 			ret = 0;
 	}
@@ -4241,9 +4237,9 @@ static int __io_compat_recvmsg_copy_hdr(struct io_kiocb *req,
 		sr->len = iomsg->iov[0].iov_len;
 		iomsg->iov = NULL;
 	} else {
-		ret = compat_import_iovec(READ, uiov, len, UIO_FASTIOV,
-						&iomsg->iov,
-						&iomsg->msg.msg_iter);
+		ret = __import_iovec(READ, (struct iovec __user *)uiov, len,
+				   UIO_FASTIOV, &iomsg->iov,
+				   &iomsg->msg.msg_iter, true);
 		if (ret < 0)
 			return ret;
 	}
diff --git a/fs/read_write.c b/fs/read_write.c
index e5e891a88442ef..0a68037580b455 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1078,7 +1078,8 @@ static size_t compat_readv(struct file *file,
 	struct iov_iter iter;
 	ssize_t ret;
 
-	ret = compat_import_iovec(READ, vec, vlen, UIO_FASTIOV, &iov, &iter);
+	ret = import_iovec(READ, (const struct iovec __user *)vec, vlen,
+			   UIO_FASTIOV, &iov, &iter);
 	if (ret >= 0) {
 		ret = do_iter_read(file, &iter, pos, flags);
 		kfree(iov);
@@ -1186,7 +1187,8 @@ static size_t compat_writev(struct file *file,
 	struct iov_iter iter;
 	ssize_t ret;
 
-	ret = compat_import_iovec(WRITE, vec, vlen, UIO_FASTIOV, &iov, &iter);
+	ret = import_iovec(WRITE, (const struct iovec __user *)vec, vlen,
+			   UIO_FASTIOV, &iov, &iter);
 	if (ret >= 0) {
 		file_start_write(file);
 		ret = do_iter_write(file, &iter, pos, flags);
diff --git a/fs/splice.c b/fs/splice.c
index d7c8a7c4db07ff..132d42b9871f9b 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1387,7 +1387,7 @@ COMPAT_SYSCALL_DEFINE4(vmsplice, int, fd, const struct compat_iovec __user *, io
 	if (error)
 		return error;
 
-	error = compat_import_iovec(type, iov32, nr_segs,
+	error = import_iovec(type, (struct iovec __user *)iov32, nr_segs,
 			     ARRAY_SIZE(iovstack), &iov, &iter);
 	if (error >= 0) {
 		error = do_vmsplice(f.file, &iter, flags);
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 92c11fe41c6228..daedc61ad3706e 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -275,14 +275,6 @@ ssize_t import_iovec(int type, const struct iovec __user *uvec,
 ssize_t __import_iovec(int type, const struct iovec __user *uvec,
 		 unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
 		 struct iov_iter *i, bool compat);
-
-#ifdef CONFIG_COMPAT
-struct compat_iovec;
-ssize_t compat_import_iovec(int type, const struct compat_iovec __user * uvector,
-		 unsigned nr_segs, unsigned fast_segs,
-		 struct iovec **iov, struct iov_iter *i);
-#endif
-
 int import_single_range(int type, void __user *buf, size_t len,
 		 struct iovec *iov, struct iov_iter *i);
 
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index d5d8afe31fca16..8c51e1b03814a3 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1806,21 +1806,11 @@ ssize_t import_iovec(int type, const struct iovec __user *uvec,
 		 unsigned nr_segs, unsigned fast_segs,
 		 struct iovec **iovp, struct iov_iter *i)
 {
-	return __import_iovec(type, uvec, nr_segs, fast_segs, iovp, i, false);
+	return __import_iovec(type, uvec, nr_segs, fast_segs, iovp, i,
+			      in_compat_syscall());
 }
 EXPORT_SYMBOL(import_iovec);
 
-#ifdef CONFIG_COMPAT
-ssize_t compat_import_iovec(int type, const struct compat_iovec __user *uvec,
-		unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
-		struct iov_iter *i)
-{
-	return __import_iovec(type, (const struct iovec __user *)uvec, nr_segs,
-			     fast_segs, iovp, i, true);
-}
-EXPORT_SYMBOL(compat_import_iovec);
-#endif
-
 int import_single_range(int rw, void __user *buf, size_t len,
 		 struct iovec *iov, struct iov_iter *i)
 {
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 5e728c20c2bead..3f2156aab44263 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -326,7 +326,8 @@ compat_process_vm_rw(compat_pid_t pid,
 	if (flags != 0)
 		return -EINVAL;
 
-	rc = compat_import_iovec(dir, lvec, liovcnt, UIO_FASTIOV, &iov_l, &iter);
+	rc = import_iovec(dir, (const struct iovec __user *)lvec, liovcnt,
+			  UIO_FASTIOV, &iov_l, &iter);
 	if (rc < 0)
 		return rc;
 	if (!iov_iter_count(&iter))
diff --git a/net/compat.c b/net/compat.c
index 95ce707a30a31d..ddd15af3a2837b 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -98,8 +98,8 @@ int get_compat_msghdr(struct msghdr *kmsg,
 	if (err)
 		return err;
 
-	err = compat_import_iovec(save_addr ? READ : WRITE, compat_ptr(ptr),
-				   len, UIO_FASTIOV, iov, &kmsg->msg_iter);
+	err = import_iovec(save_addr ? READ : WRITE, compat_ptr(ptr), len,
+			   UIO_FASTIOV, iov, &kmsg->msg_iter);
 	return err < 0 ? err : 0;
 }
 
diff --git a/security/keys/compat.c b/security/keys/compat.c
index 6ee9d8f6a4a5bb..7ae531db031cf8 100644
--- a/security/keys/compat.c
+++ b/security/keys/compat.c
@@ -33,9 +33,8 @@ static long compat_keyctl_instantiate_key_iov(
 	if (!_payload_iov)
 		ioc = 0;
 
-	ret = compat_import_iovec(WRITE, _payload_iov, ioc,
-				  ARRAY_SIZE(iovstack), &iov,
-				  &from);
+	ret = import_iovec(WRITE, (const struct iovec __user *)_payload_iov,
+			   ioc, ARRAY_SIZE(iovstack), &iov, &from);
 	if (ret < 0)
 		return ret;
 
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 5/9] fs: remove various compat readv/writev helpers
  2020-09-25  4:51 ` let import_iovec deal with compat_iovecs as well v4 Christoph Hellwig
                     ` (3 preceding siblings ...)
  2020-09-25  4:51   ` [PATCH 4/9] iov_iter: transparently handle compat iovecs in import_iovec Christoph Hellwig
@ 2020-09-25  4:51   ` Christoph Hellwig
  2020-09-25  4:51   ` [PATCH 6/9] fs: remove the compat readv/writev syscalls Christoph Hellwig
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-25  4:51 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Now that import_iovec handles compat iovecs as well, all the duplicated
code in the compat readv/writev helpers is not needed.  Remove them
and switch the compat syscall handlers to use the native helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c        | 179 +++++++----------------------------------
 include/linux/compat.h |  20 ++---
 2 files changed, 40 insertions(+), 159 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 0a68037580b455..eab427b7cc0a3f 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1068,226 +1068,107 @@ SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
 	return do_pwritev(fd, vec, vlen, pos, flags);
 }
 
+/*
+ * Various compat syscalls.  Note that they all pretend to take a native
+ * iovec - import_iovec will properly treat those as compat_iovecs based on
+ * in_compat_syscall().
+ */
 #ifdef CONFIG_COMPAT
-static size_t compat_readv(struct file *file,
-			   const struct compat_iovec __user *vec,
-			   unsigned long vlen, loff_t *pos, rwf_t flags)
-{
-	struct iovec iovstack[UIO_FASTIOV];
-	struct iovec *iov = iovstack;
-	struct iov_iter iter;
-	ssize_t ret;
-
-	ret = import_iovec(READ, (const struct iovec __user *)vec, vlen,
-			   UIO_FASTIOV, &iov, &iter);
-	if (ret >= 0) {
-		ret = do_iter_read(file, &iter, pos, flags);
-		kfree(iov);
-	}
-	if (ret > 0)
-		add_rchar(current, ret);
-	inc_syscr(current);
-	return ret;
-}
-
-static size_t do_compat_readv(compat_ulong_t fd,
-				 const struct compat_iovec __user *vec,
-				 compat_ulong_t vlen, rwf_t flags)
-{
-	struct fd f = fdget_pos(fd);
-	ssize_t ret;
-	loff_t pos;
-
-	if (!f.file)
-		return -EBADF;
-	pos = f.file->f_pos;
-	ret = compat_readv(f.file, vec, vlen, &pos, flags);
-	if (ret >= 0)
-		f.file->f_pos = pos;
-	fdput_pos(f);
-	return ret;
-
-}
-
 COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		compat_ulong_t, vlen)
 {
-	return do_compat_readv(fd, vec, vlen, 0);
-}
-
-static long do_compat_preadv64(unsigned long fd,
-				  const struct compat_iovec __user *vec,
-				  unsigned long vlen, loff_t pos, rwf_t flags)
-{
-	struct fd f;
-	ssize_t ret;
-
-	if (pos < 0)
-		return -EINVAL;
-	f = fdget(fd);
-	if (!f.file)
-		return -EBADF;
-	ret = -ESPIPE;
-	if (f.file->f_mode & FMODE_PREAD)
-		ret = compat_readv(f.file, vec, vlen, &pos, flags);
-	fdput(f);
-	return ret;
+	return do_readv(fd, vec, vlen, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
 COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		unsigned long, vlen, loff_t, pos)
 {
-	return do_compat_preadv64(fd, vec, vlen, pos, 0);
+	return do_preadv(fd, vec, vlen, pos, 0);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		compat_ulong_t, vlen, u32, pos_low, u32, pos_high)
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-	return do_compat_preadv64(fd, vec, vlen, pos, 0);
+	return do_preadv(fd, vec, vlen, pos, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
 COMPAT_SYSCALL_DEFINE5(preadv64v2, unsigned long, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		unsigned long, vlen, loff_t, pos, rwf_t, flags)
 {
 	if (pos = -1)
-		return do_compat_readv(fd, vec, vlen, flags);
-
-	return do_compat_preadv64(fd, vec, vlen, pos, flags);
+		return do_readv(fd, vec, vlen, flags);
+	return do_preadv(fd, vec, vlen, pos, flags);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		compat_ulong_t, vlen, u32, pos_low, u32, pos_high,
 		rwf_t, flags)
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
 	if (pos = -1)
-		return do_compat_readv(fd, vec, vlen, flags);
-
-	return do_compat_preadv64(fd, vec, vlen, pos, flags);
-}
-
-static size_t compat_writev(struct file *file,
-			    const struct compat_iovec __user *vec,
-			    unsigned long vlen, loff_t *pos, rwf_t flags)
-{
-	struct iovec iovstack[UIO_FASTIOV];
-	struct iovec *iov = iovstack;
-	struct iov_iter iter;
-	ssize_t ret;
-
-	ret = import_iovec(WRITE, (const struct iovec __user *)vec, vlen,
-			   UIO_FASTIOV, &iov, &iter);
-	if (ret >= 0) {
-		file_start_write(file);
-		ret = do_iter_write(file, &iter, pos, flags);
-		file_end_write(file);
-		kfree(iov);
-	}
-	if (ret > 0)
-		add_wchar(current, ret);
-	inc_syscw(current);
-	return ret;
-}
-
-static size_t do_compat_writev(compat_ulong_t fd,
-				  const struct compat_iovec __user* vec,
-				  compat_ulong_t vlen, rwf_t flags)
-{
-	struct fd f = fdget_pos(fd);
-	ssize_t ret;
-	loff_t pos;
-
-	if (!f.file)
-		return -EBADF;
-	pos = f.file->f_pos;
-	ret = compat_writev(f.file, vec, vlen, &pos, flags);
-	if (ret >= 0)
-		f.file->f_pos = pos;
-	fdput_pos(f);
-	return ret;
+		return do_readv(fd, vec, vlen, flags);
+	return do_preadv(fd, vec, vlen, pos, flags);
 }
 
 COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
-		const struct compat_iovec __user *, vec,
+		const struct iovec __user *, vec,
 		compat_ulong_t, vlen)
 {
-	return do_compat_writev(fd, vec, vlen, 0);
-}
-
-static long do_compat_pwritev64(unsigned long fd,
-				   const struct compat_iovec __user *vec,
-				   unsigned long vlen, loff_t pos, rwf_t flags)
-{
-	struct fd f;
-	ssize_t ret;
-
-	if (pos < 0)
-		return -EINVAL;
-	f = fdget(fd);
-	if (!f.file)
-		return -EBADF;
-	ret = -ESPIPE;
-	if (f.file->f_mode & FMODE_PWRITE)
-		ret = compat_writev(f.file, vec, vlen, &pos, flags);
-	fdput(f);
-	return ret;
+	return do_writev(fd, vec, vlen, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PWRITEV64
 COMPAT_SYSCALL_DEFINE4(pwritev64, unsigned long, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		unsigned long, vlen, loff_t, pos)
 {
-	return do_compat_pwritev64(fd, vec, vlen, pos, 0);
+	return do_pwritev(fd, vec, vlen, pos, 0);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE5(pwritev, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *,vec,
 		compat_ulong_t, vlen, u32, pos_low, u32, pos_high)
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-	return do_compat_pwritev64(fd, vec, vlen, pos, 0);
+	return do_pwritev(fd, vec, vlen, pos, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PWRITEV64V2
 COMPAT_SYSCALL_DEFINE5(pwritev64v2, unsigned long, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *, vec,
 		unsigned long, vlen, loff_t, pos, rwf_t, flags)
 {
 	if (pos = -1)
-		return do_compat_writev(fd, vec, vlen, flags);
-
-	return do_compat_pwritev64(fd, vec, vlen, pos, flags);
+		return do_writev(fd, vec, vlen, flags);
+	return do_pwritev(fd, vec, vlen, pos, flags);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE6(pwritev2, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
+		const struct iovec __user *,vec,
 		compat_ulong_t, vlen, u32, pos_low, u32, pos_high, rwf_t, flags)
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
 	if (pos = -1)
-		return do_compat_writev(fd, vec, vlen, flags);
-
-	return do_compat_pwritev64(fd, vec, vlen, pos, flags);
+		return do_writev(fd, vec, vlen, flags);
+	return do_pwritev(fd, vec, vlen, pos, flags);
 }
-
-#endif
+#endif /* CONFIG_COMPAT */
 
 static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 		  	   size_t count, loff_t max)
diff --git a/include/linux/compat.h b/include/linux/compat.h
index b930de791ff16b..306ea7e1172d8d 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -546,25 +546,25 @@ asmlinkage long compat_sys_getdents(unsigned int fd,
 /* fs/read_write.c */
 asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int);
 asmlinkage ssize_t compat_sys_readv(compat_ulong_t fd,
-		const struct compat_iovec __user *vec, compat_ulong_t vlen);
+		const struct iovec __user *vec, compat_ulong_t vlen);
 asmlinkage ssize_t compat_sys_writev(compat_ulong_t fd,
-		const struct compat_iovec __user *vec, compat_ulong_t vlen);
+		const struct iovec __user *vec, compat_ulong_t vlen);
 /* No generic prototype for pread64 and pwrite64 */
 asmlinkage ssize_t compat_sys_preadv(compat_ulong_t fd,
-		const struct compat_iovec __user *vec,
+		const struct iovec __user *vec,
 		compat_ulong_t vlen, u32 pos_low, u32 pos_high);
 asmlinkage ssize_t compat_sys_pwritev(compat_ulong_t fd,
-		const struct compat_iovec __user *vec,
+		const struct iovec __user *vec,
 		compat_ulong_t vlen, u32 pos_low, u32 pos_high);
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
 asmlinkage long compat_sys_preadv64(unsigned long fd,
-		const struct compat_iovec __user *vec,
+		const struct iovec __user *vec,
 		unsigned long vlen, loff_t pos);
 #endif
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PWRITEV64
 asmlinkage long compat_sys_pwritev64(unsigned long fd,
-		const struct compat_iovec __user *vec,
+		const struct iovec __user *vec,
 		unsigned long vlen, loff_t pos);
 #endif
 
@@ -800,20 +800,20 @@ asmlinkage long compat_sys_execveat(int dfd, const char __user *filename,
 		     const compat_uptr_t __user *argv,
 		     const compat_uptr_t __user *envp, int flags);
 asmlinkage ssize_t compat_sys_preadv2(compat_ulong_t fd,
-		const struct compat_iovec __user *vec,
+		const struct iovec __user *vec,
 		compat_ulong_t vlen, u32 pos_low, u32 pos_high, rwf_t flags);
 asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
-		const struct compat_iovec __user *vec,
+		const struct iovec __user *vec,
 		compat_ulong_t vlen, u32 pos_low, u32 pos_high, rwf_t flags);
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
 asmlinkage long  compat_sys_preadv64v2(unsigned long fd,
-		const struct compat_iovec __user *vec,
+		const struct iovec __user *vec,
 		unsigned long vlen, loff_t pos, rwf_t flags);
 #endif
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PWRITEV64V2
 asmlinkage long compat_sys_pwritev64v2(unsigned long fd,
-		const struct compat_iovec __user *vec,
+		const struct iovec __user *vec,
 		unsigned long vlen, loff_t pos, rwf_t flags);
 #endif
 
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 6/9] fs: remove the compat readv/writev syscalls
  2020-09-25  4:51 ` let import_iovec deal with compat_iovecs as well v4 Christoph Hellwig
                     ` (4 preceding siblings ...)
  2020-09-25  4:51   ` [PATCH 5/9] fs: remove various compat readv/writev helpers Christoph Hellwig
@ 2020-09-25  4:51   ` Christoph Hellwig
  2020-09-25  4:51   ` [PATCH 7/9] fs: remove compat_sys_vmsplice Christoph Hellwig
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-25  4:51 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Now that import_iovec handles compat iovecs, the native readv and writev
syscalls can be used for the compat case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/include/asm/unistd32.h                  |  4 ++--
 arch/mips/kernel/syscalls/syscall_n32.tbl          |  4 ++--
 arch/mips/kernel/syscalls/syscall_o32.tbl          |  4 ++--
 arch/parisc/kernel/syscalls/syscall.tbl            |  4 ++--
 arch/powerpc/kernel/syscalls/syscall.tbl           |  4 ++--
 arch/s390/kernel/syscalls/syscall.tbl              |  4 ++--
 arch/sparc/kernel/syscalls/syscall.tbl             |  4 ++--
 arch/x86/entry/syscall_x32.c                       |  2 ++
 arch/x86/entry/syscalls/syscall_32.tbl             |  4 ++--
 arch/x86/entry/syscalls/syscall_64.tbl             |  4 ++--
 fs/read_write.c                                    | 14 --------------
 include/linux/compat.h                             |  4 ----
 include/uapi/asm-generic/unistd.h                  |  4 ++--
 tools/include/uapi/asm-generic/unistd.h            |  4 ++--
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl |  4 ++--
 tools/perf/arch/s390/entry/syscalls/syscall.tbl    |  4 ++--
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  |  4 ++--
 17 files changed, 30 insertions(+), 46 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 734860ac7cf9d5..4a236493dca5b9 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -301,9 +301,9 @@ __SYSCALL(__NR_flock, sys_flock)
 #define __NR_msync 144
 __SYSCALL(__NR_msync, sys_msync)
 #define __NR_readv 145
-__SYSCALL(__NR_readv, compat_sys_readv)
+__SYSCALL(__NR_readv, sys_readv)
 #define __NR_writev 146
-__SYSCALL(__NR_writev, compat_sys_writev)
+__SYSCALL(__NR_writev, sys_writev)
 #define __NR_getsid 147
 __SYSCALL(__NR_getsid, sys_getsid)
 #define __NR_fdatasync 148
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index f9df9edb67a407..c99a92646f8ee9 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -25,8 +25,8 @@
 15	n32	ioctl				compat_sys_ioctl
 16	n32	pread64				sys_pread64
 17	n32	pwrite64			sys_pwrite64
-18	n32	readv				compat_sys_readv
-19	n32	writev				compat_sys_writev
+18	n32	readv				sys_readv
+19	n32	writev				sys_writev
 20	n32	access				sys_access
 21	n32	pipe				sysm_pipe
 22	n32	_newselect			compat_sys_select
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 195b43cf27c848..075064d10661bf 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -156,8 +156,8 @@
 142	o32	_newselect			sys_select			compat_sys_select
 143	o32	flock				sys_flock
 144	o32	msync				sys_msync
-145	o32	readv				sys_readv			compat_sys_readv
-146	o32	writev				sys_writev			compat_sys_writev
+145	o32	readv				sys_readv
+146	o32	writev				sys_writev
 147	o32	cacheflush			sys_cacheflush
 148	o32	cachectl			sys_cachectl
 149	o32	sysmips				__sys_sysmips
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index def64d221cd4fb..192abde0001d9d 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -159,8 +159,8 @@
 142	common	_newselect		sys_select			compat_sys_select
 143	common	flock			sys_flock
 144	common	msync			sys_msync
-145	common	readv			sys_readv			compat_sys_readv
-146	common	writev			sys_writev			compat_sys_writev
+145	common	readv			sys_readv
+146	common	writev			sys_writev
 147	common	getsid			sys_getsid
 148	common	fdatasync		sys_fdatasync
 149	common	_sysctl			sys_ni_syscall
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index c2d737ff2e7bec..6f1e2ecf0edad9 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -193,8 +193,8 @@
 142	common	_newselect			sys_select			compat_sys_select
 143	common	flock				sys_flock
 144	common	msync				sys_msync
-145	common	readv				sys_readv			compat_sys_readv
-146	common	writev				sys_writev			compat_sys_writev
+145	common	readv				sys_readv
+146	common	writev				sys_writev
 147	common	getsid				sys_getsid
 148	common	fdatasync			sys_fdatasync
 149	nospu	_sysctl				sys_ni_syscall
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 10456bc936fb09..6101cf2e004cb4 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -134,8 +134,8 @@
 142  64		select			sys_select			-
 143  common	flock			sys_flock			sys_flock
 144  common	msync			sys_msync			sys_msync
-145  common	readv			sys_readv			compat_sys_readv
-146  common	writev			sys_writev			compat_sys_writev
+145  common	readv			sys_readv			sys_readv
+146  common	writev			sys_writev			sys_writev
 147  common	getsid			sys_getsid			sys_getsid
 148  common	fdatasync		sys_fdatasync			sys_fdatasync
 149  common	_sysctl			-				-
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 4af114e84f2022..a87ddb282ab16f 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -149,8 +149,8 @@
 117	common	getrusage		sys_getrusage			compat_sys_getrusage
 118	common	getsockopt		sys_getsockopt			sys_getsockopt
 119	common	getcwd			sys_getcwd
-120	common	readv			sys_readv			compat_sys_readv
-121	common	writev			sys_writev			compat_sys_writev
+120	common	readv			sys_readv
+121	common	writev			sys_writev
 122	common	settimeofday		sys_settimeofday		compat_sys_settimeofday
 123	32	fchown			sys_fchown16
 123	64	fchown			sys_fchown
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index 1583831f61a9df..aa321444a41f63 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -12,6 +12,8 @@
  * Reuse the 64-bit entry points for the x32 versions that occupy different
  * slots in the syscall table.
  */
+#define __x32_sys_readv		__x64_sys_readv
+#define __x32_sys_writev	__x64_sys_writev
 #define __x32_sys_getsockopt	__x64_sys_getsockopt
 #define __x32_sys_setsockopt	__x64_sys_setsockopt
 
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 9d11028736661b..54ab4beb517f25 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -156,8 +156,8 @@
 142	i386	_newselect		sys_select			compat_sys_select
 143	i386	flock			sys_flock
 144	i386	msync			sys_msync
-145	i386	readv			sys_readv			compat_sys_readv
-146	i386	writev			sys_writev			compat_sys_writev
+145	i386	readv			sys_readv
+146	i386	writev			sys_writev
 147	i386	getsid			sys_getsid
 148	i386	fdatasync		sys_fdatasync
 149	i386	_sysctl			sys_ni_syscall
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index f30d6ae9a6883c..b1e59957c5c51c 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -371,8 +371,8 @@
 512	x32	rt_sigaction		compat_sys_rt_sigaction
 513	x32	rt_sigreturn		compat_sys_x32_rt_sigreturn
 514	x32	ioctl			compat_sys_ioctl
-515	x32	readv			compat_sys_readv
-516	x32	writev			compat_sys_writev
+515	x32	readv			sys_readv
+516	x32	writev			sys_writev
 517	x32	recvfrom		compat_sys_recvfrom
 518	x32	sendmsg			compat_sys_sendmsg
 519	x32	recvmsg			compat_sys_recvmsg
diff --git a/fs/read_write.c b/fs/read_write.c
index eab427b7cc0a3f..6c13f744c34a38 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1074,13 +1074,6 @@ SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
  * in_compat_syscall().
  */
 #ifdef CONFIG_COMPAT
-COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
-		const struct iovec __user *, vec,
-		compat_ulong_t, vlen)
-{
-	return do_readv(fd, vec, vlen, 0);
-}
-
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
 COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
 		const struct iovec __user *, vec,
@@ -1122,13 +1115,6 @@ COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
 	return do_preadv(fd, vec, vlen, pos, flags);
 }
 
-COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
-		const struct iovec __user *, vec,
-		compat_ulong_t, vlen)
-{
-	return do_writev(fd, vec, vlen, 0);
-}
-
 #ifdef __ARCH_WANT_COMPAT_SYS_PWRITEV64
 COMPAT_SYSCALL_DEFINE4(pwritev64, unsigned long, fd,
 		const struct iovec __user *, vec,
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 306ea7e1172d8d..0f1620988267e6 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -545,10 +545,6 @@ asmlinkage long compat_sys_getdents(unsigned int fd,
 
 /* fs/read_write.c */
 asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int);
-asmlinkage ssize_t compat_sys_readv(compat_ulong_t fd,
-		const struct iovec __user *vec, compat_ulong_t vlen);
-asmlinkage ssize_t compat_sys_writev(compat_ulong_t fd,
-		const struct iovec __user *vec, compat_ulong_t vlen);
 /* No generic prototype for pread64 and pwrite64 */
 asmlinkage ssize_t compat_sys_preadv(compat_ulong_t fd,
 		const struct iovec __user *vec,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 995b36c2ea7d8a..211c9eacbda6eb 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -207,9 +207,9 @@ __SYSCALL(__NR_read, sys_read)
 #define __NR_write 64
 __SYSCALL(__NR_write, sys_write)
 #define __NR_readv 65
-__SC_COMP(__NR_readv, sys_readv, compat_sys_readv)
+__SC_COMP(__NR_readv, sys_readv, sys_readv)
 #define __NR_writev 66
-__SC_COMP(__NR_writev, sys_writev, compat_sys_writev)
+__SC_COMP(__NR_writev, sys_writev, sys_writev)
 #define __NR_pread64 67
 __SC_COMP(__NR_pread64, sys_pread64, compat_sys_pread64)
 #define __NR_pwrite64 68
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index 995b36c2ea7d8a..211c9eacbda6eb 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -207,9 +207,9 @@ __SYSCALL(__NR_read, sys_read)
 #define __NR_write 64
 __SYSCALL(__NR_write, sys_write)
 #define __NR_readv 65
-__SC_COMP(__NR_readv, sys_readv, compat_sys_readv)
+__SC_COMP(__NR_readv, sys_readv, sys_readv)
 #define __NR_writev 66
-__SC_COMP(__NR_writev, sys_writev, compat_sys_writev)
+__SC_COMP(__NR_writev, sys_writev, sys_writev)
 #define __NR_pread64 67
 __SC_COMP(__NR_pread64, sys_pread64, compat_sys_pread64)
 #define __NR_pwrite64 68
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index 3ca6fe057a0b1f..46be68029587f9 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -189,8 +189,8 @@
 142	common	_newselect			sys_select			compat_sys_select
 143	common	flock				sys_flock
 144	common	msync				sys_msync
-145	common	readv				sys_readv			compat_sys_readv
-146	common	writev				sys_writev			compat_sys_writev
+145	common	readv				sys_readv
+146	common	writev				sys_writev
 147	common	getsid				sys_getsid
 148	common	fdatasync			sys_fdatasync
 149	nospu	_sysctl				sys_ni_syscall
diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
index 6a0bbea225db0d..fb5e61ce9d5838 100644
--- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
@@ -134,8 +134,8 @@
 142  64		select			sys_select			-
 143  common	flock			sys_flock			sys_flock
 144  common	msync			sys_msync			compat_sys_msync
-145  common	readv			sys_readv			compat_sys_readv
-146  common	writev			sys_writev			compat_sys_writev
+145  common	readv			sys_readv
+146  common	writev			sys_writev
 147  common	getsid			sys_getsid			sys_getsid
 148  common	fdatasync		sys_fdatasync			sys_fdatasync
 149  common	_sysctl			-				-
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
index f30d6ae9a6883c..b1e59957c5c51c 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -371,8 +371,8 @@
 512	x32	rt_sigaction		compat_sys_rt_sigaction
 513	x32	rt_sigreturn		compat_sys_x32_rt_sigreturn
 514	x32	ioctl			compat_sys_ioctl
-515	x32	readv			compat_sys_readv
-516	x32	writev			compat_sys_writev
+515	x32	readv			sys_readv
+516	x32	writev			sys_writev
 517	x32	recvfrom		compat_sys_recvfrom
 518	x32	sendmsg			compat_sys_sendmsg
 519	x32	recvmsg			compat_sys_recvmsg
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 7/9] fs: remove compat_sys_vmsplice
  2020-09-25  4:51 ` let import_iovec deal with compat_iovecs as well v4 Christoph Hellwig
                     ` (5 preceding siblings ...)
  2020-09-25  4:51   ` [PATCH 6/9] fs: remove the compat readv/writev syscalls Christoph Hellwig
@ 2020-09-25  4:51   ` Christoph Hellwig
  2020-09-25  4:51   ` [PATCH 8/9] mm: remove compat_process_vm_{readv,writev} Christoph Hellwig
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-25  4:51 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Now that import_iovec handles compat iovecs, the native vmsplice syscall
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/include/asm/unistd32.h             |  2 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl     |  2 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl     |  2 +-
 arch/parisc/kernel/syscalls/syscall.tbl       |  2 +-
 arch/powerpc/kernel/syscalls/syscall.tbl      |  2 +-
 arch/s390/kernel/syscalls/syscall.tbl         |  2 +-
 arch/sparc/kernel/syscalls/syscall.tbl        |  2 +-
 arch/x86/entry/syscall_x32.c                  |  1 +
 arch/x86/entry/syscalls/syscall_32.tbl        |  2 +-
 arch/x86/entry/syscalls/syscall_64.tbl        |  2 +-
 fs/splice.c                                   | 57 +++++--------------
 include/linux/compat.h                        |  4 --
 include/uapi/asm-generic/unistd.h             |  2 +-
 tools/include/uapi/asm-generic/unistd.h       |  2 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  2 +-
 .../perf/arch/s390/entry/syscalls/syscall.tbl |  2 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl    |  2 +-
 17 files changed, 28 insertions(+), 62 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 4a236493dca5b9..11dfae3a8563bd 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -697,7 +697,7 @@ __SYSCALL(__NR_sync_file_range2, compat_sys_aarch32_sync_file_range2)
 #define __NR_tee 342
 __SYSCALL(__NR_tee, sys_tee)
 #define __NR_vmsplice 343
-__SYSCALL(__NR_vmsplice, compat_sys_vmsplice)
+__SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages 344
 __SYSCALL(__NR_move_pages, compat_sys_move_pages)
 #define __NR_getcpu 345
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index c99a92646f8ee9..5a39d4de0ac85b 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -278,7 +278,7 @@
 267	n32	splice				sys_splice
 268	n32	sync_file_range			sys_sync_file_range
 269	n32	tee				sys_tee
-270	n32	vmsplice			compat_sys_vmsplice
+270	n32	vmsplice			sys_vmsplice
 271	n32	move_pages			compat_sys_move_pages
 272	n32	set_robust_list			compat_sys_set_robust_list
 273	n32	get_robust_list			compat_sys_get_robust_list
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 075064d10661bf..136efc6b8c5444 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -318,7 +318,7 @@
 304	o32	splice				sys_splice
 305	o32	sync_file_range			sys_sync_file_range		sys32_sync_file_range
 306	o32	tee				sys_tee
-307	o32	vmsplice			sys_vmsplice			compat_sys_vmsplice
+307	o32	vmsplice			sys_vmsplice
 308	o32	move_pages			sys_move_pages			compat_sys_move_pages
 309	o32	set_robust_list			sys_set_robust_list		compat_sys_set_robust_list
 310	o32	get_robust_list			sys_get_robust_list		compat_sys_get_robust_list
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 192abde0001d9d..a9e184192caedd 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -330,7 +330,7 @@
 292	32	sync_file_range		parisc_sync_file_range
 292	64	sync_file_range		sys_sync_file_range
 293	common	tee			sys_tee
-294	common	vmsplice		sys_vmsplice			compat_sys_vmsplice
+294	common	vmsplice		sys_vmsplice
 295	common	move_pages		sys_move_pages			compat_sys_move_pages
 296	common	getcpu			sys_getcpu
 297	common	epoll_pwait		sys_epoll_pwait			compat_sys_epoll_pwait
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 6f1e2ecf0edad9..0d4985919ca34d 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -369,7 +369,7 @@
 282	common	unshare				sys_unshare
 283	common	splice				sys_splice
 284	common	tee				sys_tee
-285	common	vmsplice			sys_vmsplice			compat_sys_vmsplice
+285	common	vmsplice			sys_vmsplice
 286	common	openat				sys_openat			compat_sys_openat
 287	common	mkdirat				sys_mkdirat
 288	common	mknodat				sys_mknodat
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 6101cf2e004cb4..b5495a42814bd1 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -316,7 +316,7 @@
 306  common	splice			sys_splice			sys_splice
 307  common	sync_file_range		sys_sync_file_range		compat_sys_s390_sync_file_range
 308  common	tee			sys_tee				sys_tee
-309  common	vmsplice		sys_vmsplice			compat_sys_vmsplice
+309  common	vmsplice		sys_vmsplice			sys_vmsplice
 310  common	move_pages		sys_move_pages			compat_sys_move_pages
 311  common	getcpu			sys_getcpu			sys_getcpu
 312  common	epoll_pwait		sys_epoll_pwait			compat_sys_epoll_pwait
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index a87ddb282ab16f..f1810c1a35caa5 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -38,7 +38,7 @@
 23	64    	setuid			sys_setuid
 24	32	getuid			sys_getuid16
 24	64   	getuid			sys_getuid
-25	common	vmsplice		sys_vmsplice			compat_sys_vmsplice
+25	common	vmsplice		sys_vmsplice
 26	common	ptrace			sys_ptrace			compat_sys_ptrace
 27	common	alarm			sys_alarm
 28	common	sigaltstack		sys_sigaltstack			compat_sys_sigaltstack
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index aa321444a41f63..a4840b9d50ad14 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -16,6 +16,7 @@
 #define __x32_sys_writev	__x64_sys_writev
 #define __x32_sys_getsockopt	__x64_sys_getsockopt
 #define __x32_sys_setsockopt	__x64_sys_setsockopt
+#define __x32_sys_vmsplice	__x64_sys_vmsplice
 
 #define __SYSCALL_64(nr, sym)
 
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 54ab4beb517f25..0fb2f172581e51 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -327,7 +327,7 @@
 313	i386	splice			sys_splice
 314	i386	sync_file_range		sys_ia32_sync_file_range
 315	i386	tee			sys_tee
-316	i386	vmsplice		sys_vmsplice			compat_sys_vmsplice
+316	i386	vmsplice		sys_vmsplice
 317	i386	move_pages		sys_move_pages			compat_sys_move_pages
 318	i386	getcpu			sys_getcpu
 319	i386	epoll_pwait		sys_epoll_pwait
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index b1e59957c5c51c..642af919183de4 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -388,7 +388,7 @@
 529	x32	waitid			compat_sys_waitid
 530	x32	set_robust_list		compat_sys_set_robust_list
 531	x32	get_robust_list		compat_sys_get_robust_list
-532	x32	vmsplice		compat_sys_vmsplice
+532	x32	vmsplice		sys_vmsplice
 533	x32	move_pages		compat_sys_move_pages
 534	x32	preadv			compat_sys_preadv64
 535	x32	pwritev			compat_sys_pwritev64
diff --git a/fs/splice.c b/fs/splice.c
index 132d42b9871f9b..18d84544030b39 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -33,7 +33,6 @@
 #include <linux/security.h>
 #include <linux/gfp.h>
 #include <linux/socket.h>
-#include <linux/compat.h>
 #include <linux/sched/signal.h>
 
 #include "internal.h"
@@ -1332,20 +1331,6 @@ static int vmsplice_type(struct fd f, int *type)
  * Currently we punt and implement it as a normal copy, see pipe_to_user().
  *
  */
-static long do_vmsplice(struct file *f, struct iov_iter *iter, unsigned int flags)
-{
-	if (unlikely(flags & ~SPLICE_F_ALL))
-		return -EINVAL;
-
-	if (!iov_iter_count(iter))
-		return 0;
-
-	if (iov_iter_rw(iter) = WRITE)
-		return vmsplice_to_pipe(f, iter, flags);
-	else
-		return vmsplice_to_user(f, iter, flags);
-}
-
 SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
 		unsigned long, nr_segs, unsigned int, flags)
 {
@@ -1356,6 +1341,9 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
 	struct fd f;
 	int type;
 
+	if (unlikely(flags & ~SPLICE_F_ALL))
+		return -EINVAL;
+
 	f = fdget(fd);
 	error = vmsplice_type(f, &type);
 	if (error)
@@ -1363,40 +1351,21 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
 
 	error = import_iovec(type, uiov, nr_segs,
 			     ARRAY_SIZE(iovstack), &iov, &iter);
-	if (error >= 0) {
-		error = do_vmsplice(f.file, &iter, flags);
-		kfree(iov);
-	}
-	fdput(f);
-	return error;
-}
+	if (error < 0)
+		goto out_fdput;
 
-#ifdef CONFIG_COMPAT
-COMPAT_SYSCALL_DEFINE4(vmsplice, int, fd, const struct compat_iovec __user *, iov32,
-		    unsigned int, nr_segs, unsigned int, flags)
-{
-	struct iovec iovstack[UIO_FASTIOV];
-	struct iovec *iov = iovstack;
-	struct iov_iter iter;
-	ssize_t error;
-	struct fd f;
-	int type;
-
-	f = fdget(fd);
-	error = vmsplice_type(f, &type);
-	if (error)
-		return error;
+	if (!iov_iter_count(&iter))
+		error = 0;
+	else if (iov_iter_rw(&iter) = WRITE)
+		error = vmsplice_to_pipe(f.file, &iter, flags);
+	else
+		error = vmsplice_to_user(f.file, &iter, flags);
 
-	error = import_iovec(type, (struct iovec __user *)iov32, nr_segs,
-			     ARRAY_SIZE(iovstack), &iov, &iter);
-	if (error >= 0) {
-		error = do_vmsplice(f.file, &iter, flags);
-		kfree(iov);
-	}
+	kfree(iov);
+out_fdput:
 	fdput(f);
 	return error;
 }
-#endif
 
 SYSCALL_DEFINE6(splice, int, fd_in, loff_t __user *, off_in,
 		int, fd_out, loff_t __user *, off_out,
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 0f1620988267e6..9e8aa148651455 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -597,10 +597,6 @@ asmlinkage long compat_sys_signalfd4(int ufd,
 				     const compat_sigset_t __user *sigmask,
 				     compat_size_t sigsetsize, int flags);
 
-/* fs/splice.c */
-asmlinkage long compat_sys_vmsplice(int fd, const struct compat_iovec __user *,
-				    unsigned int nr_segs, unsigned int flags);
-
 /* fs/stat.c */
 asmlinkage long compat_sys_newfstatat(unsigned int dfd,
 				      const char __user *filename,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 211c9eacbda6eb..f2dcb0d5703014 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -237,7 +237,7 @@ __SC_COMP(__NR_signalfd4, sys_signalfd4, compat_sys_signalfd4)
 
 /* fs/splice.c */
 #define __NR_vmsplice 75
-__SC_COMP(__NR_vmsplice, sys_vmsplice, compat_sys_vmsplice)
+__SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_splice 76
 __SYSCALL(__NR_splice, sys_splice)
 #define __NR_tee 77
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index 211c9eacbda6eb..f2dcb0d5703014 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -237,7 +237,7 @@ __SC_COMP(__NR_signalfd4, sys_signalfd4, compat_sys_signalfd4)
 
 /* fs/splice.c */
 #define __NR_vmsplice 75
-__SC_COMP(__NR_vmsplice, sys_vmsplice, compat_sys_vmsplice)
+__SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_splice 76
 __SYSCALL(__NR_splice, sys_splice)
 #define __NR_tee 77
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index 46be68029587f9..26f0347c15118b 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -363,7 +363,7 @@
 282	common	unshare				sys_unshare
 283	common	splice				sys_splice
 284	common	tee				sys_tee
-285	common	vmsplice			sys_vmsplice			compat_sys_vmsplice
+285	common	vmsplice			sys_vmsplice
 286	common	openat				sys_openat			compat_sys_openat
 287	common	mkdirat				sys_mkdirat
 288	common	mknodat				sys_mknodat
diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
index fb5e61ce9d5838..02ad81f69bb7e3 100644
--- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
@@ -316,7 +316,7 @@
 306  common	splice			sys_splice			compat_sys_splice
 307  common	sync_file_range		sys_sync_file_range		compat_sys_s390_sync_file_range
 308  common	tee			sys_tee				compat_sys_tee
-309  common	vmsplice		sys_vmsplice			compat_sys_vmsplice
+309  common	vmsplice		sys_vmsplice			sys_vmsplice
 310  common	move_pages		sys_move_pages			compat_sys_move_pages
 311  common	getcpu			sys_getcpu			compat_sys_getcpu
 312  common	epoll_pwait		sys_epoll_pwait			compat_sys_epoll_pwait
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
index b1e59957c5c51c..642af919183de4 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -388,7 +388,7 @@
 529	x32	waitid			compat_sys_waitid
 530	x32	set_robust_list		compat_sys_set_robust_list
 531	x32	get_robust_list		compat_sys_get_robust_list
-532	x32	vmsplice		compat_sys_vmsplice
+532	x32	vmsplice		sys_vmsplice
 533	x32	move_pages		compat_sys_move_pages
 534	x32	preadv			compat_sys_preadv64
 535	x32	pwritev			compat_sys_pwritev64
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 8/9] mm: remove compat_process_vm_{readv,writev}
  2020-09-25  4:51 ` let import_iovec deal with compat_iovecs as well v4 Christoph Hellwig
                     ` (6 preceding siblings ...)
  2020-09-25  4:51   ` [PATCH 7/9] fs: remove compat_sys_vmsplice Christoph Hellwig
@ 2020-09-25  4:51   ` Christoph Hellwig
  2020-09-25  4:51   ` [PATCH 9/9] security/keys: remove compat_keyctl_instantiate_key_iov Christoph Hellwig
  2020-09-25 15:23   ` let import_iovec deal with compat_iovecs as well v4 Al Viro
  9 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-25  4:51 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Now that import_iovec handles compat iovecs, the native syscalls
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/include/asm/unistd32.h             |  4 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl     |  4 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl     |  4 +-
 arch/parisc/kernel/syscalls/syscall.tbl       |  4 +-
 arch/powerpc/kernel/syscalls/syscall.tbl      |  4 +-
 arch/s390/kernel/syscalls/syscall.tbl         |  4 +-
 arch/sparc/kernel/syscalls/syscall.tbl        |  4 +-
 arch/x86/entry/syscall_x32.c                  |  2 +
 arch/x86/entry/syscalls/syscall_32.tbl        |  4 +-
 arch/x86/entry/syscalls/syscall_64.tbl        |  4 +-
 include/linux/compat.h                        |  8 ---
 include/uapi/asm-generic/unistd.h             |  6 +-
 mm/process_vm_access.c                        | 69 -------------------
 tools/include/uapi/asm-generic/unistd.h       |  6 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  4 +-
 .../perf/arch/s390/entry/syscalls/syscall.tbl |  4 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl    |  4 +-
 17 files changed, 30 insertions(+), 109 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 11dfae3a8563bd..0c280a05f699bf 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -763,9 +763,9 @@ __SYSCALL(__NR_sendmmsg, compat_sys_sendmmsg)
 #define __NR_setns 375
 __SYSCALL(__NR_setns, sys_setns)
 #define __NR_process_vm_readv 376
-__SYSCALL(__NR_process_vm_readv, compat_sys_process_vm_readv)
+__SYSCALL(__NR_process_vm_readv, sys_process_vm_readv)
 #define __NR_process_vm_writev 377
-__SYSCALL(__NR_process_vm_writev, compat_sys_process_vm_writev)
+__SYSCALL(__NR_process_vm_writev, sys_process_vm_writev)
 #define __NR_kcmp 378
 __SYSCALL(__NR_kcmp, sys_kcmp)
 #define __NR_finit_module 379
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 5a39d4de0ac85b..0bc2e0fcf1ee56 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -317,8 +317,8 @@
 306	n32	syncfs				sys_syncfs
 307	n32	sendmmsg			compat_sys_sendmmsg
 308	n32	setns				sys_setns
-309	n32	process_vm_readv		compat_sys_process_vm_readv
-310	n32	process_vm_writev		compat_sys_process_vm_writev
+309	n32	process_vm_readv		sys_process_vm_readv
+310	n32	process_vm_writev		sys_process_vm_writev
 311	n32	kcmp				sys_kcmp
 312	n32	finit_module			sys_finit_module
 313	n32	sched_setattr			sys_sched_setattr
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 136efc6b8c5444..b408c13b934296 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -356,8 +356,8 @@
 342	o32	syncfs				sys_syncfs
 343	o32	sendmmsg			sys_sendmmsg			compat_sys_sendmmsg
 344	o32	setns				sys_setns
-345	o32	process_vm_readv		sys_process_vm_readv		compat_sys_process_vm_readv
-346	o32	process_vm_writev		sys_process_vm_writev		compat_sys_process_vm_writev
+345	o32	process_vm_readv		sys_process_vm_readv
+346	o32	process_vm_writev		sys_process_vm_writev
 347	o32	kcmp				sys_kcmp
 348	o32	finit_module			sys_finit_module
 349	o32	sched_setattr			sys_sched_setattr
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index a9e184192caedd..2015a5124b78ad 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -372,8 +372,8 @@
 327	common	syncfs			sys_syncfs
 328	common	setns			sys_setns
 329	common	sendmmsg		sys_sendmmsg			compat_sys_sendmmsg
-330	common	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-331	common	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+330	common	process_vm_readv	sys_process_vm_readv
+331	common	process_vm_writev	sys_process_vm_writev
 332	common	kcmp			sys_kcmp
 333	common	finit_module		sys_finit_module
 334	common	sched_setattr		sys_sched_setattr
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 0d4985919ca34d..66a472aa635d3f 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -449,8 +449,8 @@
 348	common	syncfs				sys_syncfs
 349	common	sendmmsg			sys_sendmmsg			compat_sys_sendmmsg
 350	common	setns				sys_setns
-351	nospu	process_vm_readv		sys_process_vm_readv		compat_sys_process_vm_readv
-352	nospu	process_vm_writev		sys_process_vm_writev		compat_sys_process_vm_writev
+351	nospu	process_vm_readv		sys_process_vm_readv
+352	nospu	process_vm_writev		sys_process_vm_writev
 353	nospu	finit_module			sys_finit_module
 354	nospu	kcmp				sys_kcmp
 355	common	sched_setattr			sys_sched_setattr
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index b5495a42814bd1..7485867a490bb2 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -347,8 +347,8 @@
 337  common	clock_adjtime		sys_clock_adjtime		sys_clock_adjtime32
 338  common	syncfs			sys_syncfs			sys_syncfs
 339  common	setns			sys_setns			sys_setns
-340  common	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-341  common	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+340  common	process_vm_readv	sys_process_vm_readv		sys_process_vm_readv
+341  common	process_vm_writev	sys_process_vm_writev		sys_process_vm_writev
 342  common	s390_runtime_instr	sys_s390_runtime_instr		sys_s390_runtime_instr
 343  common	kcmp			sys_kcmp			sys_kcmp
 344  common	finit_module		sys_finit_module		sys_finit_module
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index f1810c1a35caa5..4a9365b2e340b2 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -406,8 +406,8 @@
 335	common	syncfs			sys_syncfs
 336	common	sendmmsg		sys_sendmmsg			compat_sys_sendmmsg
 337	common	setns			sys_setns
-338	common	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-339	common	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+338	common	process_vm_readv	sys_process_vm_readv
+339	common	process_vm_writev	sys_process_vm_writev
 340	32	kern_features		sys_ni_syscall			sys_kern_features
 340	64	kern_features		sys_kern_features
 341	common	kcmp			sys_kcmp
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index a4840b9d50ad14..f2fe0a33bcfdd5 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -17,6 +17,8 @@
 #define __x32_sys_getsockopt	__x64_sys_getsockopt
 #define __x32_sys_setsockopt	__x64_sys_setsockopt
 #define __x32_sys_vmsplice	__x64_sys_vmsplice
+#define __x32_sys_process_vm_readv	__x64_sys_process_vm_readv
+#define __x32_sys_process_vm_writev	__x64_sys_process_vm_writev
 
 #define __SYSCALL_64(nr, sym)
 
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 0fb2f172581e51..5fbe10ad8a23fc 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -358,8 +358,8 @@
 344	i386	syncfs			sys_syncfs
 345	i386	sendmmsg		sys_sendmmsg			compat_sys_sendmmsg
 346	i386	setns			sys_setns
-347	i386	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-348	i386	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+347	i386	process_vm_readv	sys_process_vm_readv
+348	i386	process_vm_writev	sys_process_vm_writev
 349	i386	kcmp			sys_kcmp
 350	i386	finit_module		sys_finit_module
 351	i386	sched_setattr		sys_sched_setattr
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 642af919183de4..347809649ba28f 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -395,8 +395,8 @@
 536	x32	rt_tgsigqueueinfo	compat_sys_rt_tgsigqueueinfo
 537	x32	recvmmsg		compat_sys_recvmmsg_time64
 538	x32	sendmmsg		compat_sys_sendmmsg
-539	x32	process_vm_readv	compat_sys_process_vm_readv
-540	x32	process_vm_writev	compat_sys_process_vm_writev
+539	x32	process_vm_readv	sys_process_vm_readv
+540	x32	process_vm_writev	sys_process_vm_writev
 541	x32	setsockopt		sys_setsockopt
 542	x32	getsockopt		sys_getsockopt
 543	x32	io_setup		compat_sys_io_setup
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 9e8aa148651455..2ae4e4bc22c0a4 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -780,14 +780,6 @@ asmlinkage long compat_sys_open_by_handle_at(int mountdirfd,
 					     int flags);
 asmlinkage long compat_sys_sendmmsg(int fd, struct compat_mmsghdr __user *mmsg,
 				    unsigned vlen, unsigned int flags);
-asmlinkage ssize_t compat_sys_process_vm_readv(compat_pid_t pid,
-		const struct compat_iovec __user *lvec,
-		compat_ulong_t liovcnt, const struct compat_iovec __user *rvec,
-		compat_ulong_t riovcnt, compat_ulong_t flags);
-asmlinkage ssize_t compat_sys_process_vm_writev(compat_pid_t pid,
-		const struct compat_iovec __user *lvec,
-		compat_ulong_t liovcnt, const struct compat_iovec __user *rvec,
-		compat_ulong_t riovcnt, compat_ulong_t flags);
 asmlinkage long compat_sys_execveat(int dfd, const char __user *filename,
 		     const compat_uptr_t __user *argv,
 		     const compat_uptr_t __user *envp, int flags);
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index f2dcb0d5703014..c1dfe99c9c3f70 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -727,11 +727,9 @@ __SYSCALL(__NR_setns, sys_setns)
 #define __NR_sendmmsg 269
 __SC_COMP(__NR_sendmmsg, sys_sendmmsg, compat_sys_sendmmsg)
 #define __NR_process_vm_readv 270
-__SC_COMP(__NR_process_vm_readv, sys_process_vm_readv, \
-          compat_sys_process_vm_readv)
+__SYSCALL(__NR_process_vm_readv, sys_process_vm_readv)
 #define __NR_process_vm_writev 271
-__SC_COMP(__NR_process_vm_writev, sys_process_vm_writev, \
-          compat_sys_process_vm_writev)
+__SYSCALL(__NR_process_vm_writev, sys_process_vm_writev)
 #define __NR_kcmp 272
 __SYSCALL(__NR_kcmp, sys_kcmp)
 #define __NR_finit_module 273
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 3f2156aab44263..fd12da80b6f27b 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -14,10 +14,6 @@
 #include <linux/slab.h>
 #include <linux/syscalls.h>
 
-#ifdef CONFIG_COMPAT
-#include <linux/compat.h>
-#endif
-
 /**
  * process_vm_rw_pages - read/write pages from task specified
  * @pages: array of pointers to pages we want to copy
@@ -304,68 +300,3 @@ SYSCALL_DEFINE6(process_vm_writev, pid_t, pid,
 {
 	return process_vm_rw(pid, lvec, liovcnt, rvec, riovcnt, flags, 1);
 }
-
-#ifdef CONFIG_COMPAT
-
-static ssize_t
-compat_process_vm_rw(compat_pid_t pid,
-		     const struct compat_iovec __user *lvec,
-		     unsigned long liovcnt,
-		     const struct compat_iovec __user *rvec,
-		     unsigned long riovcnt,
-		     unsigned long flags, int vm_write)
-{
-	struct iovec iovstack_l[UIO_FASTIOV];
-	struct iovec iovstack_r[UIO_FASTIOV];
-	struct iovec *iov_l = iovstack_l;
-	struct iovec *iov_r = iovstack_r;
-	struct iov_iter iter;
-	ssize_t rc = -EFAULT;
-	int dir = vm_write ? WRITE : READ;
-
-	if (flags != 0)
-		return -EINVAL;
-
-	rc = import_iovec(dir, (const struct iovec __user *)lvec, liovcnt,
-			  UIO_FASTIOV, &iov_l, &iter);
-	if (rc < 0)
-		return rc;
-	if (!iov_iter_count(&iter))
-		goto free_iov_l;
-	iov_r = iovec_from_user((const struct iovec __user *)rvec, riovcnt,
-				UIO_FASTIOV, iovstack_r, true);
-	if (IS_ERR(iov_r)) {
-		rc = PTR_ERR(iov_r);
-		goto free_iov_l;
-	}
-	rc = process_vm_rw_core(pid, &iter, iov_r, riovcnt, flags, vm_write);
-	if (iov_r != iovstack_r)
-		kfree(iov_r);
-free_iov_l:
-	kfree(iov_l);
-	return rc;
-}
-
-COMPAT_SYSCALL_DEFINE6(process_vm_readv, compat_pid_t, pid,
-		       const struct compat_iovec __user *, lvec,
-		       compat_ulong_t, liovcnt,
-		       const struct compat_iovec __user *, rvec,
-		       compat_ulong_t, riovcnt,
-		       compat_ulong_t, flags)
-{
-	return compat_process_vm_rw(pid, lvec, liovcnt, rvec,
-				    riovcnt, flags, 0);
-}
-
-COMPAT_SYSCALL_DEFINE6(process_vm_writev, compat_pid_t, pid,
-		       const struct compat_iovec __user *, lvec,
-		       compat_ulong_t, liovcnt,
-		       const struct compat_iovec __user *, rvec,
-		       compat_ulong_t, riovcnt,
-		       compat_ulong_t, flags)
-{
-	return compat_process_vm_rw(pid, lvec, liovcnt, rvec,
-				    riovcnt, flags, 1);
-}
-
-#endif
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index f2dcb0d5703014..c1dfe99c9c3f70 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -727,11 +727,9 @@ __SYSCALL(__NR_setns, sys_setns)
 #define __NR_sendmmsg 269
 __SC_COMP(__NR_sendmmsg, sys_sendmmsg, compat_sys_sendmmsg)
 #define __NR_process_vm_readv 270
-__SC_COMP(__NR_process_vm_readv, sys_process_vm_readv, \
-          compat_sys_process_vm_readv)
+__SYSCALL(__NR_process_vm_readv, sys_process_vm_readv)
 #define __NR_process_vm_writev 271
-__SC_COMP(__NR_process_vm_writev, sys_process_vm_writev, \
-          compat_sys_process_vm_writev)
+__SYSCALL(__NR_process_vm_writev, sys_process_vm_writev)
 #define __NR_kcmp 272
 __SYSCALL(__NR_kcmp, sys_kcmp)
 #define __NR_finit_module 273
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index 26f0347c15118b..a188f053cbf90a 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -443,8 +443,8 @@
 348	common	syncfs				sys_syncfs
 349	common	sendmmsg			sys_sendmmsg			compat_sys_sendmmsg
 350	common	setns				sys_setns
-351	nospu	process_vm_readv		sys_process_vm_readv		compat_sys_process_vm_readv
-352	nospu	process_vm_writev		sys_process_vm_writev		compat_sys_process_vm_writev
+351	nospu	process_vm_readv		sys_process_vm_readv
+352	nospu	process_vm_writev		sys_process_vm_writev
 353	nospu	finit_module			sys_finit_module
 354	nospu	kcmp				sys_kcmp
 355	common	sched_setattr			sys_sched_setattr
diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
index 02ad81f69bb7e3..c44c83032c3a04 100644
--- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
@@ -347,8 +347,8 @@
 337  common	clock_adjtime		sys_clock_adjtime		compat_sys_clock_adjtime
 338  common	syncfs			sys_syncfs			sys_syncfs
 339  common	setns			sys_setns			sys_setns
-340  common	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
-341  common	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+340  common	process_vm_readv	sys_process_vm_readv		sys_process_vm_readv
+341  common	process_vm_writev	sys_process_vm_writev		sys_process_vm_writev
 342  common	s390_runtime_instr	sys_s390_runtime_instr		sys_s390_runtime_instr
 343  common	kcmp			sys_kcmp			compat_sys_kcmp
 344  common	finit_module		sys_finit_module		compat_sys_finit_module
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
index 642af919183de4..347809649ba28f 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -395,8 +395,8 @@
 536	x32	rt_tgsigqueueinfo	compat_sys_rt_tgsigqueueinfo
 537	x32	recvmmsg		compat_sys_recvmmsg_time64
 538	x32	sendmmsg		compat_sys_sendmmsg
-539	x32	process_vm_readv	compat_sys_process_vm_readv
-540	x32	process_vm_writev	compat_sys_process_vm_writev
+539	x32	process_vm_readv	sys_process_vm_readv
+540	x32	process_vm_writev	sys_process_vm_writev
 541	x32	setsockopt		sys_setsockopt
 542	x32	getsockopt		sys_getsockopt
 543	x32	io_setup		compat_sys_io_setup
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH 9/9] security/keys: remove compat_keyctl_instantiate_key_iov
  2020-09-25  4:51 ` let import_iovec deal with compat_iovecs as well v4 Christoph Hellwig
                     ` (7 preceding siblings ...)
  2020-09-25  4:51   ` [PATCH 8/9] mm: remove compat_process_vm_{readv,writev} Christoph Hellwig
@ 2020-09-25  4:51   ` Christoph Hellwig
  2020-09-25 15:23   ` let import_iovec deal with compat_iovecs as well v4 Al Viro
  9 siblings, 0 replies; 171+ messages in thread
From: Christoph Hellwig @ 2020-09-25  4:51 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

Now that import_iovec handles compat iovecs, the native version of
keyctl_instantiate_key_iov can be used for the compat case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 security/keys/compat.c   | 36 ++----------------------------------
 security/keys/internal.h |  5 -----
 security/keys/keyctl.c   |  2 +-
 3 files changed, 3 insertions(+), 40 deletions(-)

diff --git a/security/keys/compat.c b/security/keys/compat.c
index 7ae531db031cf8..1545efdca56227 100644
--- a/security/keys/compat.c
+++ b/security/keys/compat.c
@@ -11,38 +11,6 @@
 #include <linux/slab.h>
 #include "internal.h"
 
-/*
- * Instantiate a key with the specified compatibility multipart payload and
- * link the key into the destination keyring if one is given.
- *
- * The caller must have the appropriate instantiation permit set for this to
- * work (see keyctl_assume_authority).  No other permissions are required.
- *
- * If successful, 0 will be returned.
- */
-static long compat_keyctl_instantiate_key_iov(
-	key_serial_t id,
-	const struct compat_iovec __user *_payload_iov,
-	unsigned ioc,
-	key_serial_t ringid)
-{
-	struct iovec iovstack[UIO_FASTIOV], *iov = iovstack;
-	struct iov_iter from;
-	long ret;
-
-	if (!_payload_iov)
-		ioc = 0;
-
-	ret = import_iovec(WRITE, (const struct iovec __user *)_payload_iov,
-			   ioc, ARRAY_SIZE(iovstack), &iov, &from);
-	if (ret < 0)
-		return ret;
-
-	ret = keyctl_instantiate_key_common(id, &from, ringid);
-	kfree(iov);
-	return ret;
-}
-
 /*
  * The key control system call, 32-bit compatibility version for 64-bit archs
  */
@@ -113,8 +81,8 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option,
 		return keyctl_reject_key(arg2, arg3, arg4, arg5);
 
 	case KEYCTL_INSTANTIATE_IOV:
-		return compat_keyctl_instantiate_key_iov(
-			arg2, compat_ptr(arg3), arg4, arg5);
+		return keyctl_instantiate_key_iov(arg2, compat_ptr(arg3), arg4,
+						  arg5);
 
 	case KEYCTL_INVALIDATE:
 		return keyctl_invalidate_key(arg2);
diff --git a/security/keys/internal.h b/security/keys/internal.h
index 338a526cbfa516..9b9cf3b6fcbb4d 100644
--- a/security/keys/internal.h
+++ b/security/keys/internal.h
@@ -262,11 +262,6 @@ extern long keyctl_instantiate_key_iov(key_serial_t,
 				       const struct iovec __user *,
 				       unsigned, key_serial_t);
 extern long keyctl_invalidate_key(key_serial_t);
-
-struct iov_iter;
-extern long keyctl_instantiate_key_common(key_serial_t,
-					  struct iov_iter *,
-					  key_serial_t);
 extern long keyctl_restrict_keyring(key_serial_t id,
 				    const char __user *_type,
 				    const char __user *_restriction);
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index 9febd37a168fd0..e26bbccda7ccee 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -1164,7 +1164,7 @@ static int keyctl_change_reqkey_auth(struct key *key)
  *
  * If successful, 0 will be returned.
  */
-long keyctl_instantiate_key_common(key_serial_t id,
+static long keyctl_instantiate_key_common(key_serial_t id,
 				   struct iov_iter *from,
 				   key_serial_t ringid)
 {
-- 
2.28.0

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: let import_iovec deal with compat_iovecs as well v4
  2020-09-25  4:51 ` let import_iovec deal with compat_iovecs as well v4 Christoph Hellwig
                     ` (8 preceding siblings ...)
  2020-09-25  4:51   ` [PATCH 9/9] security/keys: remove compat_keyctl_instantiate_key_iov Christoph Hellwig
@ 2020-09-25 15:23   ` Al Viro
  9 siblings, 0 replies; 171+ messages in thread
From: Al Viro @ 2020-09-25 15:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	David Laight, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Fri, Sep 25, 2020 at 06:51:37AM +0200, Christoph Hellwig wrote:
> Hi Al,
> 
> this series changes import_iovec to transparently deal with compat iovec
> structures, and then cleanups up a lot of code dupliation.

OK, I can live with that.  Applied, let's see if it passes smoke tests
into -next it goes.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-09-25  4:51   ` [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c Christoph Hellwig
@ 2020-10-21 16:13     ` Greg KH
  2020-10-21 20:59       ` David Laight
  2020-10-21 23:39       ` Al Viro
  0 siblings, 2 replies; 171+ messages in thread
From: Greg KH @ 2020-10-21 16:13 UTC (permalink / raw)
  To: Christoph Hellwig, kernel-team
  Cc: Alexander Viro, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, David Laight, linux-arm-kernel, linux-kernel,
	linux-mips, linux-parisc, linuxppc-dev, linux-s390, sparclinux,
	linux-block, linux-scsi, linux-fsdevel, linux-aio, io-uring,
	linux-arch, linux-mm, netdev, keyrings, linux-security-module

On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:
> From: David Laight <David.Laight@ACULAB.COM>
> 
> This lets the compiler inline it into import_iovec() generating
> much better code.
> 
> Signed-off-by: David Laight <david.laight@aculab.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/read_write.c | 179 ------------------------------------------------
>  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 176 insertions(+), 179 deletions(-)

Strangely, this commit causes a regression in Linus's tree right now.

I can't really figure out what the regression is, only that this commit
triggers a "large Android system binary" from working properly.  There's
no kernel log messages anywhere, and I don't have any way to strace the
thing in the testing framework, so any hints that people can provide
would be most appreciated.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-21 16:13     ` Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c" Greg KH
@ 2020-10-21 20:59       ` David Laight
  2020-10-21 23:39       ` Al Viro
  1 sibling, 0 replies; 171+ messages in thread
From: David Laight @ 2020-10-21 20:59 UTC (permalink / raw)
  To: 'Greg KH', Christoph Hellwig, kernel-team
  Cc: Alexander Viro, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

From: Greg KH
> Sent: 21 October 2020 17:13
> 
> On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:
> > From: David Laight <David.Laight@ACULAB.COM>
> >
> > This lets the compiler inline it into import_iovec() generating
> > much better code.
> >
> > Signed-off-by: David Laight <david.laight@aculab.com>
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/read_write.c | 179 ------------------------------------------------
> >  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 176 insertions(+), 179 deletions(-)
> 
> Strangely, this commit causes a regression in Linus's tree right now.
> 
> I can't really figure out what the regression is, only that this commit
> triggers a "large Android system binary" from working properly.  There's
> no kernel log messages anywhere, and I don't have any way to strace the
> thing in the testing framework, so any hints that people can provide
> would be most appreciated.

My original commit just moved the function source from one file to another.
So it is odd that it makes any difference.
I don't even know if it gets inlined by Christoph's actual patch.
(I have another patch that depended on it that I need to resubmit.)

Some of the other changes from Christoph's same patch set might
make a difference though.

Might be worth forcing it to be not inlined - so it is no change.
Or try adding a kernel log to import_iovec() or the associated
copy failing.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-21 16:13     ` Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c" Greg KH
  2020-10-21 20:59       ` David Laight
@ 2020-10-21 23:39       ` Al Viro
  2020-10-22  8:26         ` Greg KH
  1 sibling, 1 reply; 171+ messages in thread
From: Al Viro @ 2020-10-21 23:39 UTC (permalink / raw)
  To: Greg KH
  Cc: Christoph Hellwig, kernel-team, Andrew Morton, Jens Axboe,
	Arnd Bergmann, David Howells, David Laight, linux-arm-kernel,
	linux-kernel, linux-mips, linux-parisc, linuxppc-dev, linux-s390,
	sparclinux, linux-block, linux-scsi, linux-fsdevel, linux-aio,
	io-uring, linux-arch, linux-mm, netdev, keyrings,
	linux-security-module

On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:
> On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:
> > From: David Laight <David.Laight@ACULAB.COM>
> > 
> > This lets the compiler inline it into import_iovec() generating
> > much better code.
> > 
> > Signed-off-by: David Laight <david.laight@aculab.com>
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/read_write.c | 179 ------------------------------------------------
> >  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 176 insertions(+), 179 deletions(-)
> 
> Strangely, this commit causes a regression in Linus's tree right now.
> 
> I can't really figure out what the regression is, only that this commit
> triggers a "large Android system binary" from working properly.  There's
> no kernel log messages anywhere, and I don't have any way to strace the
> thing in the testing framework, so any hints that people can provide
> would be most appreciated.

It's a pure move - modulo changed line breaks in the argument lists
the functions involved are identical before and after that (just checked
that directly, by checking out the trees before and after, extracting two
functions in question from fs/read_write.c and lib/iov_iter.c (before and
after, resp.) and checking the diff between those.

How certain is your bisection?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-21 23:39       ` Al Viro
@ 2020-10-22  8:26         ` Greg KH
  2020-10-22  8:35           ` David Hildenbrand
  0 siblings, 1 reply; 171+ messages in thread
From: Greg KH @ 2020-10-22  8:26 UTC (permalink / raw)
  To: Al Viro, Nick Desaulniers
  Cc: Christoph Hellwig, kernel-team, Andrew Morton, Jens Axboe,
	Arnd Bergmann, David Howells, David Laight, linux-arm-kernel,
	linux-kernel, linux-mips, linux-parisc, linuxppc-dev, linux-s390,
	sparclinux, linux-block, linux-scsi, linux-fsdevel, linux-aio,
	io-uring, linux-arch, linux-mm, netdev, keyrings,
	linux-security-module

On Thu, Oct 22, 2020 at 12:39:14AM +0100, Al Viro wrote:
> On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:
> > On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:
> > > From: David Laight <David.Laight@ACULAB.COM>
> > > 
> > > This lets the compiler inline it into import_iovec() generating
> > > much better code.
> > > 
> > > Signed-off-by: David Laight <david.laight@aculab.com>
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  fs/read_write.c | 179 ------------------------------------------------
> > >  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 176 insertions(+), 179 deletions(-)
> > 
> > Strangely, this commit causes a regression in Linus's tree right now.
> > 
> > I can't really figure out what the regression is, only that this commit
> > triggers a "large Android system binary" from working properly.  There's
> > no kernel log messages anywhere, and I don't have any way to strace the
> > thing in the testing framework, so any hints that people can provide
> > would be most appreciated.
> 
> It's a pure move - modulo changed line breaks in the argument lists
> the functions involved are identical before and after that (just checked
> that directly, by checking out the trees before and after, extracting two
> functions in question from fs/read_write.c and lib/iov_iter.c (before and
> after, resp.) and checking the diff between those.
> 
> How certain is your bisection?

The bisection is very reproducable.

But, this looks now to be a compiler bug.  I'm using the latest version
of clang and if I put "noinline" at the front of the function,
everything works.

Nick, any ideas here as to who I should report this to?

I'll work on a fixup patch for the Android kernel tree to see if I can
work around it there, but others will hit this in Linus's tree sooner or
later...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22  8:26         ` Greg KH
@ 2020-10-22  8:35           ` David Hildenbrand
  2020-10-22  8:40             ` David Laight
  0 siblings, 1 reply; 171+ messages in thread
From: David Hildenbrand @ 2020-10-22  8:35 UTC (permalink / raw)
  To: Greg KH, Al Viro, Nick Desaulniers
  Cc: Christoph Hellwig, kernel-team, Andrew Morton, Jens Axboe,
	Arnd Bergmann, David Howells, David Laight, linux-arm-kernel,
	linux-kernel, linux-mips, linux-parisc, linuxppc-dev, linux-s390,
	sparclinux, linux-block, linux-scsi, linux-fsdevel, linux-aio,
	io-uring, linux-arch, linux-mm, netdev, keyrings,
	linux-security-module

On 22.10.20 10:26, Greg KH wrote:
> On Thu, Oct 22, 2020 at 12:39:14AM +0100, Al Viro wrote:
>> On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:
>>> On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:
>>>> From: David Laight <David.Laight@ACULAB.COM>
>>>>
>>>> This lets the compiler inline it into import_iovec() generating
>>>> much better code.
>>>>
>>>> Signed-off-by: David Laight <david.laight@aculab.com>
>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>> ---
>>>>  fs/read_write.c | 179 ------------------------------------------------
>>>>  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
>>>>  2 files changed, 176 insertions(+), 179 deletions(-)
>>>
>>> Strangely, this commit causes a regression in Linus's tree right now.
>>>
>>> I can't really figure out what the regression is, only that this commit
>>> triggers a "large Android system binary" from working properly.  There's
>>> no kernel log messages anywhere, and I don't have any way to strace the
>>> thing in the testing framework, so any hints that people can provide
>>> would be most appreciated.
>>
>> It's a pure move - modulo changed line breaks in the argument lists
>> the functions involved are identical before and after that (just checked
>> that directly, by checking out the trees before and after, extracting two
>> functions in question from fs/read_write.c and lib/iov_iter.c (before and
>> after, resp.) and checking the diff between those.
>>
>> How certain is your bisection?
> 
> The bisection is very reproducable.
> 
> But, this looks now to be a compiler bug.  I'm using the latest version
> of clang and if I put "noinline" at the front of the function,
> everything works.

Well, the compiler can do more invasive optimizations when inlining. If
you have buggy code that relies on some unspecified behavior, inlining
can change the behavior ... but going over that code, there isn't too
much action going on. At least nothing screamed at me.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22  8:35           ` David Hildenbrand
@ 2020-10-22  8:40             ` David Laight
  2020-10-22  8:48               ` David Hildenbrand
  0 siblings, 1 reply; 171+ messages in thread
From: David Laight @ 2020-10-22  8:40 UTC (permalink / raw)
  To: 'David Hildenbrand', Greg KH, Al Viro, Nick Desaulniers
  Cc: Christoph Hellwig, kernel-team, Andrew Morton, Jens Axboe,
	Arnd Bergmann, David Howells, linux-arm-kernel, linux-kernel,
	linux-mips, linux-parisc, linuxppc-dev, linux-s390, sparclinux,
	linux-block, linux-scsi, linux-fsdevel, linux-aio, io-uring,
	linux-arch, linux-mm, netdev, keyrings, linux-security-module

From: David Hildenbrand
> Sent: 22 October 2020 09:35
> 
> On 22.10.20 10:26, Greg KH wrote:
> > On Thu, Oct 22, 2020 at 12:39:14AM +0100, Al Viro wrote:
> >> On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:
> >>> On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:
> >>>> From: David Laight <David.Laight@ACULAB.COM>
> >>>>
> >>>> This lets the compiler inline it into import_iovec() generating
> >>>> much better code.
> >>>>
> >>>> Signed-off-by: David Laight <david.laight@aculab.com>
> >>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
> >>>> ---
> >>>>  fs/read_write.c | 179 ------------------------------------------------
> >>>>  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
> >>>>  2 files changed, 176 insertions(+), 179 deletions(-)
> >>>
> >>> Strangely, this commit causes a regression in Linus's tree right now.
> >>>
> >>> I can't really figure out what the regression is, only that this commit
> >>> triggers a "large Android system binary" from working properly.  There's
> >>> no kernel log messages anywhere, and I don't have any way to strace the
> >>> thing in the testing framework, so any hints that people can provide
> >>> would be most appreciated.
> >>
> >> It's a pure move - modulo changed line breaks in the argument lists
> >> the functions involved are identical before and after that (just checked
> >> that directly, by checking out the trees before and after, extracting two
> >> functions in question from fs/read_write.c and lib/iov_iter.c (before and
> >> after, resp.) and checking the diff between those.
> >>
> >> How certain is your bisection?
> >
> > The bisection is very reproducable.
> >
> > But, this looks now to be a compiler bug.  I'm using the latest version
> > of clang and if I put "noinline" at the front of the function,
> > everything works.
> 
> Well, the compiler can do more invasive optimizations when inlining. If
> you have buggy code that relies on some unspecified behavior, inlining
> can change the behavior ... but going over that code, there isn't too
> much action going on. At least nothing screamed at me.

Apart from all the optimisations that get rid off the 'pass be reference'
parameters and strange conditional tests.
Plenty of scope for the compiler getting it wrong.
But nothing even vaguely illegal.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22  8:40             ` David Laight
@ 2020-10-22  8:48               ` David Hildenbrand
  2020-10-22  9:01                 ` Greg KH
  2020-10-22  9:02                 ` David Laight
  0 siblings, 2 replies; 171+ messages in thread
From: David Hildenbrand @ 2020-10-22  8:48 UTC (permalink / raw)
  To: David Laight, Greg KH, Al Viro, Nick Desaulniers
  Cc: Christoph Hellwig, kernel-team, Andrew Morton, Jens Axboe,
	Arnd Bergmann, David Howells, linux-arm-kernel, linux-kernel,
	linux-mips, linux-parisc, linuxppc-dev, linux-s390, sparclinux,
	linux-block, linux-scsi, linux-fsdevel, linux-aio, io-uring,
	linux-arch, linux-mm, netdev, keyrings, linux-security-module

On 22.10.20 10:40, David Laight wrote:
> From: David Hildenbrand
>> Sent: 22 October 2020 09:35
>>
>> On 22.10.20 10:26, Greg KH wrote:
>>> On Thu, Oct 22, 2020 at 12:39:14AM +0100, Al Viro wrote:
>>>> On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:
>>>>> On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:
>>>>>> From: David Laight <David.Laight@ACULAB.COM>
>>>>>>
>>>>>> This lets the compiler inline it into import_iovec() generating
>>>>>> much better code.
>>>>>>
>>>>>> Signed-off-by: David Laight <david.laight@aculab.com>
>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>> ---
>>>>>>  fs/read_write.c | 179 ------------------------------------------------
>>>>>>  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>  2 files changed, 176 insertions(+), 179 deletions(-)
>>>>>
>>>>> Strangely, this commit causes a regression in Linus's tree right now.
>>>>>
>>>>> I can't really figure out what the regression is, only that this commit
>>>>> triggers a "large Android system binary" from working properly.  There's
>>>>> no kernel log messages anywhere, and I don't have any way to strace the
>>>>> thing in the testing framework, so any hints that people can provide
>>>>> would be most appreciated.
>>>>
>>>> It's a pure move - modulo changed line breaks in the argument lists
>>>> the functions involved are identical before and after that (just checked
>>>> that directly, by checking out the trees before and after, extracting two
>>>> functions in question from fs/read_write.c and lib/iov_iter.c (before and
>>>> after, resp.) and checking the diff between those.
>>>>
>>>> How certain is your bisection?
>>>
>>> The bisection is very reproducable.
>>>
>>> But, this looks now to be a compiler bug.  I'm using the latest version
>>> of clang and if I put "noinline" at the front of the function,
>>> everything works.
>>
>> Well, the compiler can do more invasive optimizations when inlining. If
>> you have buggy code that relies on some unspecified behavior, inlining
>> can change the behavior ... but going over that code, there isn't too
>> much action going on. At least nothing screamed at me.
> 
> Apart from all the optimisations that get rid off the 'pass be reference'
> parameters and strange conditional tests.
> Plenty of scope for the compiler getting it wrong.
> But nothing even vaguely illegal.

Not the first time that people blame the compiler to then figure out
that something else is wrong ... but maybe this time is different :)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22  8:48               ` David Hildenbrand
@ 2020-10-22  9:01                 ` Greg KH
  2020-10-22  9:06                   ` David Laight
  2020-10-22  9:19                   ` David Hildenbrand
  2020-10-22  9:02                 ` David Laight
  1 sibling, 2 replies; 171+ messages in thread
From: Greg KH @ 2020-10-22  9:01 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: David Laight, Al Viro, Nick Desaulniers, Christoph Hellwig,
	kernel-team, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Thu, Oct 22, 2020 at 10:48:59AM +0200, David Hildenbrand wrote:
> On 22.10.20 10:40, David Laight wrote:
> > From: David Hildenbrand
> >> Sent: 22 October 2020 09:35
> >>
> >> On 22.10.20 10:26, Greg KH wrote:
> >>> On Thu, Oct 22, 2020 at 12:39:14AM +0100, Al Viro wrote:
> >>>> On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:
> >>>>> On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:
> >>>>>> From: David Laight <David.Laight@ACULAB.COM>
> >>>>>>
> >>>>>> This lets the compiler inline it into import_iovec() generating
> >>>>>> much better code.
> >>>>>>
> >>>>>> Signed-off-by: David Laight <david.laight@aculab.com>
> >>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
> >>>>>> ---
> >>>>>>  fs/read_write.c | 179 ------------------------------------------------
> >>>>>>  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>  2 files changed, 176 insertions(+), 179 deletions(-)
> >>>>>
> >>>>> Strangely, this commit causes a regression in Linus's tree right now.
> >>>>>
> >>>>> I can't really figure out what the regression is, only that this commit
> >>>>> triggers a "large Android system binary" from working properly.  There's
> >>>>> no kernel log messages anywhere, and I don't have any way to strace the
> >>>>> thing in the testing framework, so any hints that people can provide
> >>>>> would be most appreciated.
> >>>>
> >>>> It's a pure move - modulo changed line breaks in the argument lists
> >>>> the functions involved are identical before and after that (just checked
> >>>> that directly, by checking out the trees before and after, extracting two
> >>>> functions in question from fs/read_write.c and lib/iov_iter.c (before and
> >>>> after, resp.) and checking the diff between those.
> >>>>
> >>>> How certain is your bisection?
> >>>
> >>> The bisection is very reproducable.
> >>>
> >>> But, this looks now to be a compiler bug.  I'm using the latest version
> >>> of clang and if I put "noinline" at the front of the function,
> >>> everything works.
> >>
> >> Well, the compiler can do more invasive optimizations when inlining. If
> >> you have buggy code that relies on some unspecified behavior, inlining
> >> can change the behavior ... but going over that code, there isn't too
> >> much action going on. At least nothing screamed at me.
> > 
> > Apart from all the optimisations that get rid off the 'pass be reference'
> > parameters and strange conditional tests.
> > Plenty of scope for the compiler getting it wrong.
> > But nothing even vaguely illegal.
> 
> Not the first time that people blame the compiler to then figure out
> that something else is wrong ... but maybe this time is different :)

I agree, I hate to blame the compiler, that's almost never the real
problem, but this one sure "feels" like it.

I'm running some more tests, trying to narrow things down as just adding
a "noinline" to the function that got moved here doesn't work on Linus's
tree at the moment because the function was split into multiple
functions.

Give me a few hours...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22  8:48               ` David Hildenbrand
  2020-10-22  9:01                 ` Greg KH
@ 2020-10-22  9:02                 ` David Laight
  1 sibling, 0 replies; 171+ messages in thread
From: David Laight @ 2020-10-22  9:02 UTC (permalink / raw)
  To: 'David Hildenbrand', Greg KH, Al Viro, Nick Desaulniers
  Cc: Christoph Hellwig, kernel-team, Andrew Morton, Jens Axboe,
	Arnd Bergmann, David Howells, linux-arm-kernel, linux-kernel,
	linux-mips, linux-parisc, linuxppc-dev, linux-s390, sparclinux,
	linux-block, linux-scsi, linux-fsdevel, linux-aio, io-uring,
	linux-arch, linux-mm, netdev, keyrings, linux-security-module

From: David Hildenbrand
> Sent: 22 October 2020 09:49
...
> >>> But, this looks now to be a compiler bug.  I'm using the latest version
> >>> of clang and if I put "noinline" at the front of the function,
> >>> everything works.
> >>
> >> Well, the compiler can do more invasive optimizations when inlining. If
> >> you have buggy code that relies on some unspecified behavior, inlining
> >> can change the behavior ... but going over that code, there isn't too
> >> much action going on. At least nothing screamed at me.
> >
> > Apart from all the optimisations that get rid off the 'pass be reference'
> > parameters and strange conditional tests.
> > Plenty of scope for the compiler getting it wrong.
> > But nothing even vaguely illegal.
> 
> Not the first time that people blame the compiler to then figure out
> that something else is wrong ... but maybe this time is different :)

Usually down to missing asm 'memory' constraints...

Need to read the obj file to see what the compiler did.

The code must be 'approximately right' or nothing would run.
So I'd guess it has to do with > 8 fragments.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22  9:01                 ` Greg KH
@ 2020-10-22  9:06                   ` David Laight
  2020-10-22  9:19                   ` David Hildenbrand
  1 sibling, 0 replies; 171+ messages in thread
From: David Laight @ 2020-10-22  9:06 UTC (permalink / raw)
  To: 'Greg KH', David Hildenbrand
  Cc: Al Viro, Nick Desaulniers, Christoph Hellwig, kernel-team,
	Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

From: Greg KH
> Sent: 22 October 2020 10:02
...
> I'm running some more tests, trying to narrow things down as just adding
> a "noinline" to the function that got moved here doesn't work on Linus's
> tree at the moment because the function was split into multiple
> functions.

I was going to look at that once rc2 is in - and the kernel is
likely to work.

I suspect the split version doesn't get inlined the same way?
Which leaves the horrid argument conversion the inline got
rid of back there again.

Which all rather begs the question of why the compiler doesn't
generate the expected code.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22  9:01                 ` Greg KH
  2020-10-22  9:06                   ` David Laight
@ 2020-10-22  9:19                   ` David Hildenbrand
  2020-10-22  9:25                     ` David Hildenbrand
  2020-10-22  9:28                     ` David Laight
  1 sibling, 2 replies; 171+ messages in thread
From: David Hildenbrand @ 2020-10-22  9:19 UTC (permalink / raw)
  To: Greg KH
  Cc: David Laight, Al Viro, Nick Desaulniers, Christoph Hellwig,
	kernel-team, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On 22.10.20 11:01, Greg KH wrote:
> On Thu, Oct 22, 2020 at 10:48:59AM +0200, David Hildenbrand wrote:
>> On 22.10.20 10:40, David Laight wrote:
>>> From: David Hildenbrand
>>>> Sent: 22 October 2020 09:35
>>>>
>>>> On 22.10.20 10:26, Greg KH wrote:
>>>>> On Thu, Oct 22, 2020 at 12:39:14AM +0100, Al Viro wrote:
>>>>>> On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:
>>>>>>> On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:
>>>>>>>> From: David Laight <David.Laight@ACULAB.COM>
>>>>>>>>
>>>>>>>> This lets the compiler inline it into import_iovec() generating
>>>>>>>> much better code.
>>>>>>>>
>>>>>>>> Signed-off-by: David Laight <david.laight@aculab.com>
>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>> ---
>>>>>>>>  fs/read_write.c | 179 ------------------------------------------------
>>>>>>>>  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>  2 files changed, 176 insertions(+), 179 deletions(-)
>>>>>>>
>>>>>>> Strangely, this commit causes a regression in Linus's tree right now.
>>>>>>>
>>>>>>> I can't really figure out what the regression is, only that this commit
>>>>>>> triggers a "large Android system binary" from working properly.  There's
>>>>>>> no kernel log messages anywhere, and I don't have any way to strace the
>>>>>>> thing in the testing framework, so any hints that people can provide
>>>>>>> would be most appreciated.
>>>>>>
>>>>>> It's a pure move - modulo changed line breaks in the argument lists
>>>>>> the functions involved are identical before and after that (just checked
>>>>>> that directly, by checking out the trees before and after, extracting two
>>>>>> functions in question from fs/read_write.c and lib/iov_iter.c (before and
>>>>>> after, resp.) and checking the diff between those.
>>>>>>
>>>>>> How certain is your bisection?
>>>>>
>>>>> The bisection is very reproducable.
>>>>>
>>>>> But, this looks now to be a compiler bug.  I'm using the latest version
>>>>> of clang and if I put "noinline" at the front of the function,
>>>>> everything works.
>>>>
>>>> Well, the compiler can do more invasive optimizations when inlining. If
>>>> you have buggy code that relies on some unspecified behavior, inlining
>>>> can change the behavior ... but going over that code, there isn't too
>>>> much action going on. At least nothing screamed at me.
>>>
>>> Apart from all the optimisations that get rid off the 'pass be reference'
>>> parameters and strange conditional tests.
>>> Plenty of scope for the compiler getting it wrong.
>>> But nothing even vaguely illegal.
>>
>> Not the first time that people blame the compiler to then figure out
>> that something else is wrong ... but maybe this time is different :)
> 
> I agree, I hate to blame the compiler, that's almost never the real
> problem, but this one sure "feels" like it.
> 
> I'm running some more tests, trying to narrow things down as just adding
> a "noinline" to the function that got moved here doesn't work on Linus's
> tree at the moment because the function was split into multiple
> functions.
> 
> Give me a few hours...

I might be wrong but

a) import_iovec() uses:
- unsigned nr_segs -> int
- unsigned fast_segs -> int
b) rw_copy_check_uvector() uses:
- unsigned long nr_segs -> long
- unsigned long fast_seg -> long

So when calling rw_copy_check_uvector(), we have to zero-extend the
registers used for passing the arguments. That's definitely done when
calling the function explicitly. Maybe when inlining something is messed up?

Just a thought ...

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22  9:19                   ` David Hildenbrand
@ 2020-10-22  9:25                     ` David Hildenbrand
  2020-10-22  9:32                       ` David Laight
  2020-10-22  9:28                     ` David Laight
  1 sibling, 1 reply; 171+ messages in thread
From: David Hildenbrand @ 2020-10-22  9:25 UTC (permalink / raw)
  To: Greg KH
  Cc: David Laight, Al Viro, Nick Desaulniers, Christoph Hellwig,
	kernel-team, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On 22.10.20 11:19, David Hildenbrand wrote:
> On 22.10.20 11:01, Greg KH wrote:
>> On Thu, Oct 22, 2020 at 10:48:59AM +0200, David Hildenbrand wrote:
>>> On 22.10.20 10:40, David Laight wrote:
>>>> From: David Hildenbrand
>>>>> Sent: 22 October 2020 09:35
>>>>>
>>>>> On 22.10.20 10:26, Greg KH wrote:
>>>>>> On Thu, Oct 22, 2020 at 12:39:14AM +0100, Al Viro wrote:
>>>>>>> On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:
>>>>>>>> On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:
>>>>>>>>> From: David Laight <David.Laight@ACULAB.COM>
>>>>>>>>>
>>>>>>>>> This lets the compiler inline it into import_iovec() generating
>>>>>>>>> much better code.
>>>>>>>>>
>>>>>>>>> Signed-off-by: David Laight <david.laight@aculab.com>
>>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>> ---
>>>>>>>>>  fs/read_write.c | 179 ------------------------------------------------
>>>>>>>>>  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>  2 files changed, 176 insertions(+), 179 deletions(-)
>>>>>>>>
>>>>>>>> Strangely, this commit causes a regression in Linus's tree right now.
>>>>>>>>
>>>>>>>> I can't really figure out what the regression is, only that this commit
>>>>>>>> triggers a "large Android system binary" from working properly.  There's
>>>>>>>> no kernel log messages anywhere, and I don't have any way to strace the
>>>>>>>> thing in the testing framework, so any hints that people can provide
>>>>>>>> would be most appreciated.
>>>>>>>
>>>>>>> It's a pure move - modulo changed line breaks in the argument lists
>>>>>>> the functions involved are identical before and after that (just checked
>>>>>>> that directly, by checking out the trees before and after, extracting two
>>>>>>> functions in question from fs/read_write.c and lib/iov_iter.c (before and
>>>>>>> after, resp.) and checking the diff between those.
>>>>>>>
>>>>>>> How certain is your bisection?
>>>>>>
>>>>>> The bisection is very reproducable.
>>>>>>
>>>>>> But, this looks now to be a compiler bug.  I'm using the latest version
>>>>>> of clang and if I put "noinline" at the front of the function,
>>>>>> everything works.
>>>>>
>>>>> Well, the compiler can do more invasive optimizations when inlining. If
>>>>> you have buggy code that relies on some unspecified behavior, inlining
>>>>> can change the behavior ... but going over that code, there isn't too
>>>>> much action going on. At least nothing screamed at me.
>>>>
>>>> Apart from all the optimisations that get rid off the 'pass be reference'
>>>> parameters and strange conditional tests.
>>>> Plenty of scope for the compiler getting it wrong.
>>>> But nothing even vaguely illegal.
>>>
>>> Not the first time that people blame the compiler to then figure out
>>> that something else is wrong ... but maybe this time is different :)
>>
>> I agree, I hate to blame the compiler, that's almost never the real
>> problem, but this one sure "feels" like it.
>>
>> I'm running some more tests, trying to narrow things down as just adding
>> a "noinline" to the function that got moved here doesn't work on Linus's
>> tree at the moment because the function was split into multiple
>> functions.
>>
>> Give me a few hours...
> 
> I might be wrong but
> 
> a) import_iovec() uses:
> - unsigned nr_segs -> int
> - unsigned fast_segs -> int
> b) rw_copy_check_uvector() uses:
> - unsigned long nr_segs -> long
> - unsigned long fast_seg -> long
> 
> So when calling rw_copy_check_uvector(), we have to zero-extend the
> registers used for passing the arguments. That's definitely done when
> calling the function explicitly. Maybe when inlining something is messed up?
> 
> Just a thought ...
> 

... especially because I recall that clang and gcc behave slightly
differently:

https://github.com/hjl-tools/x86-psABI/issues/2

"Function args are different: narrow types are sign or zero extended to
32 bits, depending on their type. clang depends on this for incoming
args, but gcc doesn't make that assumption. But both compilers do it
when calling, so gcc code can call clang code.

The upper 32 bits of registers are always undefined garbage for types
smaller than 64 bits."

Again, just a thought.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22  9:19                   ` David Hildenbrand
  2020-10-22  9:25                     ` David Hildenbrand
@ 2020-10-22  9:28                     ` David Laight
  1 sibling, 0 replies; 171+ messages in thread
From: David Laight @ 2020-10-22  9:28 UTC (permalink / raw)
  To: 'David Hildenbrand', Greg KH
  Cc: Al Viro, Nick Desaulniers, Christoph Hellwig, kernel-team,
	Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

From: David Hildenbrand
> Sent: 22 October 2020 10:19
> 
> On 22.10.20 11:01, Greg KH wrote:
> > On Thu, Oct 22, 2020 at 10:48:59AM +0200, David Hildenbrand wrote:
> >> On 22.10.20 10:40, David Laight wrote:
> >>> From: David Hildenbrand
> >>>> Sent: 22 October 2020 09:35
> >>>>
> >>>> On 22.10.20 10:26, Greg KH wrote:
> >>>>> On Thu, Oct 22, 2020 at 12:39:14AM +0100, Al Viro wrote:
> >>>>>> On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:
> >>>>>>> On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:
> >>>>>>>> From: David Laight <David.Laight@ACULAB.COM>
> >>>>>>>>
> >>>>>>>> This lets the compiler inline it into import_iovec() generating
> >>>>>>>> much better code.
> >>>>>>>>
> >>>>>>>> Signed-off-by: David Laight <david.laight@aculab.com>
> >>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
> >>>>>>>> ---
> >>>>>>>>  fs/read_write.c | 179 ------------------------------------------------
> >>>>>>>>  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>  2 files changed, 176 insertions(+), 179 deletions(-)
> >>>>>>>
> >>>>>>> Strangely, this commit causes a regression in Linus's tree right now.
> >>>>>>>
> >>>>>>> I can't really figure out what the regression is, only that this commit
> >>>>>>> triggers a "large Android system binary" from working properly.  There's
> >>>>>>> no kernel log messages anywhere, and I don't have any way to strace the
> >>>>>>> thing in the testing framework, so any hints that people can provide
> >>>>>>> would be most appreciated.
> >>>>>>
> >>>>>> It's a pure move - modulo changed line breaks in the argument lists
> >>>>>> the functions involved are identical before and after that (just checked
> >>>>>> that directly, by checking out the trees before and after, extracting two
> >>>>>> functions in question from fs/read_write.c and lib/iov_iter.c (before and
> >>>>>> after, resp.) and checking the diff between those.
> >>>>>>
> >>>>>> How certain is your bisection?
> >>>>>
> >>>>> The bisection is very reproducable.
> >>>>>
> >>>>> But, this looks now to be a compiler bug.  I'm using the latest version
> >>>>> of clang and if I put "noinline" at the front of the function,
> >>>>> everything works.
> >>>>
> >>>> Well, the compiler can do more invasive optimizations when inlining. If
> >>>> you have buggy code that relies on some unspecified behavior, inlining
> >>>> can change the behavior ... but going over that code, there isn't too
> >>>> much action going on. At least nothing screamed at me.
> >>>
> >>> Apart from all the optimisations that get rid off the 'pass be reference'
> >>> parameters and strange conditional tests.
> >>> Plenty of scope for the compiler getting it wrong.
> >>> But nothing even vaguely illegal.
> >>
> >> Not the first time that people blame the compiler to then figure out
> >> that something else is wrong ... but maybe this time is different :)
> >
> > I agree, I hate to blame the compiler, that's almost never the real
> > problem, but this one sure "feels" like it.
> >
> > I'm running some more tests, trying to narrow things down as just adding
> > a "noinline" to the function that got moved here doesn't work on Linus's
> > tree at the moment because the function was split into multiple
> > functions.
> >
> > Give me a few hours...
> 
> I might be wrong but
> 
> a) import_iovec() uses:
> - unsigned nr_segs -> int
> - unsigned fast_segs -> int
> b) rw_copy_check_uvector() uses:
> - unsigned long nr_segs -> long
> - unsigned long fast_seg -> long
> 
> So when calling rw_copy_check_uvector(), we have to zero-extend the
> registers used for passing the arguments. That's definitely done when
> calling the function explicitly. Maybe when inlining something is messed up?

That's also not needed on x86-64 - the high bits get cleared by 32bit writes.
But, IIRC, arm64 leaves them unchanged or undefined.

I guessing that every array access uses a *(Rx + Ry) addressing
mode. So indexing an array even with 'unsigned int' requires
an explicit zero-extend on arm64?
(x86-64 ends up with an explicit sign extend when indexing an
array with 'signed int'.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22  9:25                     ` David Hildenbrand
@ 2020-10-22  9:32                       ` David Laight
  2020-10-22  9:36                         ` David Hildenbrand
  0 siblings, 1 reply; 171+ messages in thread
From: David Laight @ 2020-10-22  9:32 UTC (permalink / raw)
  To: 'David Hildenbrand', Greg KH
  Cc: Al Viro, Nick Desaulniers, Christoph Hellwig, kernel-team,
	Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

From: David Hildenbrand
> Sent: 22 October 2020 10:25
...
> ... especially because I recall that clang and gcc behave slightly
> differently:
> 
> https://github.com/hjl-tools/x86-psABI/issues/2
> 
> "Function args are different: narrow types are sign or zero extended to
> 32 bits, depending on their type. clang depends on this for incoming
> args, but gcc doesn't make that assumption. But both compilers do it
> when calling, so gcc code can call clang code.

It really is best to use 'int' (or even 'long') for all numeric
arguments (and results) regardless of the domain of the value.

Related, I've always worried about 'bool'....

> The upper 32 bits of registers are always undefined garbage for types
> smaller than 64 bits."

On x86-64 the high bits are zeroed by all 32bit loads.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22  9:32                       ` David Laight
@ 2020-10-22  9:36                         ` David Hildenbrand
  2020-10-22 10:48                           ` Greg KH
  2020-10-22 13:23                           ` Christoph Hellwig
  0 siblings, 2 replies; 171+ messages in thread
From: David Hildenbrand @ 2020-10-22  9:36 UTC (permalink / raw)
  To: David Laight, Greg KH
  Cc: Al Viro, Nick Desaulniers, Christoph Hellwig, kernel-team,
	Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

On 22.10.20 11:32, David Laight wrote:
> From: David Hildenbrand
>> Sent: 22 October 2020 10:25
> ...
>> ... especially because I recall that clang and gcc behave slightly
>> differently:
>>
>> https://github.com/hjl-tools/x86-psABI/issues/2
>>
>> "Function args are different: narrow types are sign or zero extended to
>> 32 bits, depending on their type. clang depends on this for incoming
>> args, but gcc doesn't make that assumption. But both compilers do it
>> when calling, so gcc code can call clang code.
> 
> It really is best to use 'int' (or even 'long') for all numeric
> arguments (and results) regardless of the domain of the value.
> 
> Related, I've always worried about 'bool'....
> 
>> The upper 32 bits of registers are always undefined garbage for types
>> smaller than 64 bits."
> 
> On x86-64 the high bits are zeroed by all 32bit loads.

Yeah, but does not help here.


My thinking: if the compiler that calls import_iovec() has garbage in
the upper 32 bit

a) gcc will zero it out and not rely on it being zero.
b) clang will not zero it out, assuming it is zero.

But

a) will zero it out when calling the !inlined variant
b) clang will zero it out when calling the !inlined variant

When inlining, b) strikes. We access garbage. That would mean that we
have calling code that's not generated by clang/gcc IIUC.

We can test easily by changing the parameters instead of adding an "inline".

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22  9:36                         ` David Hildenbrand
@ 2020-10-22 10:48                           ` Greg KH
  2020-10-22 12:18                             ` Greg KH
  2020-10-22 13:23                           ` Christoph Hellwig
  1 sibling, 1 reply; 171+ messages in thread
From: Greg KH @ 2020-10-22 10:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: David Laight, Al Viro, Nick Desaulniers, Christoph Hellwig,
	kernel-team, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Thu, Oct 22, 2020 at 11:36:40AM +0200, David Hildenbrand wrote:
> On 22.10.20 11:32, David Laight wrote:
> > From: David Hildenbrand
> >> Sent: 22 October 2020 10:25
> > ...
> >> ... especially because I recall that clang and gcc behave slightly
> >> differently:
> >>
> >> https://github.com/hjl-tools/x86-psABI/issues/2
> >>
> >> "Function args are different: narrow types are sign or zero extended to
> >> 32 bits, depending on their type. clang depends on this for incoming
> >> args, but gcc doesn't make that assumption. But both compilers do it
> >> when calling, so gcc code can call clang code.
> > 
> > It really is best to use 'int' (or even 'long') for all numeric
> > arguments (and results) regardless of the domain of the value.
> > 
> > Related, I've always worried about 'bool'....
> > 
> >> The upper 32 bits of registers are always undefined garbage for types
> >> smaller than 64 bits."
> > 
> > On x86-64 the high bits are zeroed by all 32bit loads.
> 
> Yeah, but does not help here.
> 
> 
> My thinking: if the compiler that calls import_iovec() has garbage in
> the upper 32 bit
> 
> a) gcc will zero it out and not rely on it being zero.
> b) clang will not zero it out, assuming it is zero.
> 
> But
> 
> a) will zero it out when calling the !inlined variant
> b) clang will zero it out when calling the !inlined variant
> 
> When inlining, b) strikes. We access garbage. That would mean that we
> have calling code that's not generated by clang/gcc IIUC.
> 
> We can test easily by changing the parameters instead of adding an "inline".

Let me try that as well, as I seem to have a good reproducer, but it
takes a while to run...

greg k-h

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 10:48                           ` Greg KH
@ 2020-10-22 12:18                             ` Greg KH
  2020-10-22 12:42                               ` David Hildenbrand
  0 siblings, 1 reply; 171+ messages in thread
From: Greg KH @ 2020-10-22 12:18 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: David Laight, Al Viro, Nick Desaulniers, Christoph Hellwig,
	kernel-team, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Thu, Oct 22, 2020 at 12:48:05PM +0200, Greg KH wrote:
> On Thu, Oct 22, 2020 at 11:36:40AM +0200, David Hildenbrand wrote:
> > On 22.10.20 11:32, David Laight wrote:
> > > From: David Hildenbrand
> > >> Sent: 22 October 2020 10:25
> > > ...
> > >> ... especially because I recall that clang and gcc behave slightly
> > >> differently:
> > >>
> > >> https://github.com/hjl-tools/x86-psABI/issues/2
> > >>
> > >> "Function args are different: narrow types are sign or zero extended to
> > >> 32 bits, depending on their type. clang depends on this for incoming
> > >> args, but gcc doesn't make that assumption. But both compilers do it
> > >> when calling, so gcc code can call clang code.
> > > 
> > > It really is best to use 'int' (or even 'long') for all numeric
> > > arguments (and results) regardless of the domain of the value.
> > > 
> > > Related, I've always worried about 'bool'....
> > > 
> > >> The upper 32 bits of registers are always undefined garbage for types
> > >> smaller than 64 bits."
> > > 
> > > On x86-64 the high bits are zeroed by all 32bit loads.
> > 
> > Yeah, but does not help here.
> > 
> > 
> > My thinking: if the compiler that calls import_iovec() has garbage in
> > the upper 32 bit
> > 
> > a) gcc will zero it out and not rely on it being zero.
> > b) clang will not zero it out, assuming it is zero.
> > 
> > But
> > 
> > a) will zero it out when calling the !inlined variant
> > b) clang will zero it out when calling the !inlined variant
> > 
> > When inlining, b) strikes. We access garbage. That would mean that we
> > have calling code that's not generated by clang/gcc IIUC.
> > 
> > We can test easily by changing the parameters instead of adding an "inline".
> 
> Let me try that as well, as I seem to have a good reproducer, but it
> takes a while to run...

Ok, that didn't work.

And I can't seem to "fix" this by adding noinline to patches further
along in the patch series (because this commit's function is no longer
present due to later patches.)

Will keep digging...

greg k-h

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 12:18                             ` Greg KH
@ 2020-10-22 12:42                               ` David Hildenbrand
  2020-10-22 12:57                                 ` Greg KH
  0 siblings, 1 reply; 171+ messages in thread
From: David Hildenbrand @ 2020-10-22 12:42 UTC (permalink / raw)
  To: Greg KH
  Cc: David Laight, Al Viro, Nick Desaulniers, Christoph Hellwig,
	kernel-team, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On 22.10.20 14:18, Greg KH wrote:
> On Thu, Oct 22, 2020 at 12:48:05PM +0200, Greg KH wrote:
>> On Thu, Oct 22, 2020 at 11:36:40AM +0200, David Hildenbrand wrote:
>>> On 22.10.20 11:32, David Laight wrote:
>>>> From: David Hildenbrand
>>>>> Sent: 22 October 2020 10:25
>>>> ...
>>>>> ... especially because I recall that clang and gcc behave slightly
>>>>> differently:
>>>>>
>>>>> https://github.com/hjl-tools/x86-psABI/issues/2
>>>>>
>>>>> "Function args are different: narrow types are sign or zero extended to
>>>>> 32 bits, depending on their type. clang depends on this for incoming
>>>>> args, but gcc doesn't make that assumption. But both compilers do it
>>>>> when calling, so gcc code can call clang code.
>>>>
>>>> It really is best to use 'int' (or even 'long') for all numeric
>>>> arguments (and results) regardless of the domain of the value.
>>>>
>>>> Related, I've always worried about 'bool'....
>>>>
>>>>> The upper 32 bits of registers are always undefined garbage for types
>>>>> smaller than 64 bits."
>>>>
>>>> On x86-64 the high bits are zeroed by all 32bit loads.
>>>
>>> Yeah, but does not help here.
>>>
>>>
>>> My thinking: if the compiler that calls import_iovec() has garbage in
>>> the upper 32 bit
>>>
>>> a) gcc will zero it out and not rely on it being zero.
>>> b) clang will not zero it out, assuming it is zero.
>>>
>>> But
>>>
>>> a) will zero it out when calling the !inlined variant
>>> b) clang will zero it out when calling the !inlined variant
>>>
>>> When inlining, b) strikes. We access garbage. That would mean that we
>>> have calling code that's not generated by clang/gcc IIUC.
>>>
>>> We can test easily by changing the parameters instead of adding an "inline".
>>
>> Let me try that as well, as I seem to have a good reproducer, but it
>> takes a while to run...
> 
> Ok, that didn't work.
> 
> And I can't seem to "fix" this by adding noinline to patches further
> along in the patch series (because this commit's function is no longer
> present due to later patches.)

We might have the same issues with iovec_from_user() and friends now.

> 
> Will keep digging...
> 
> greg k-h
> 


Might be worth to give this a try, just to see if it's related to
garbage in upper 32 bit and the way clang is handling it (might be a BUG
in clang, though):


diff --git a/include/linux/uio.h b/include/linux/uio.h
index 72d88566694e..7527298c6b56 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -267,7 +267,7 @@ size_t hash_and_copy_to_iter(const void *addr,
size_t bytes, void *hashp,
                struct iov_iter *i);

 struct iovec *iovec_from_user(const struct iovec __user *uvector,
-               unsigned long nr_segs, unsigned long fast_segs,
+               unsigned nr_segs, unsigned fast_segs,
                struct iovec *fast_iov, bool compat);
 ssize_t import_iovec(int type, const struct iovec __user *uvec,
                 unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 1635111c5bd2..58417f1916dc 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1652,7 +1652,7 @@ const void *dup_iter(struct iov_iter *new, struct
iov_iter *old, gfp_t flags)
 EXPORT_SYMBOL(dup_iter);

 static int copy_compat_iovec_from_user(struct iovec *iov,
-               const struct iovec __user *uvec, unsigned long nr_segs)
+               const struct iovec __user *uvec, unsigned nr_segs)
 {
        const struct compat_iovec __user *uiov =
                (const struct compat_iovec __user *)uvec;
@@ -1684,7 +1684,7 @@ static int copy_compat_iovec_from_user(struct
iovec *iov,
 }

 static int copy_iovec_from_user(struct iovec *iov,
-               const struct iovec __user *uvec, unsigned long nr_segs)
+               const struct iovec __user *uvec, unsigned nr_segs)
 {
        unsigned long seg;

@@ -1699,7 +1699,7 @@ static int copy_iovec_from_user(struct iovec *iov,
 }

 struct iovec *iovec_from_user(const struct iovec __user *uvec,
-               unsigned long nr_segs, unsigned long fast_segs,
+               unsigned nr_segs, unsigned fast_segs,
                struct iovec *fast_iov, bool compat)
 {
        struct iovec *iov = fast_iov;
@@ -1738,7 +1738,7 @@ ssize_t __import_iovec(int type, const struct
iovec __user *uvec,
                 struct iov_iter *i, bool compat)
 {
        ssize_t total_len = 0;
-       unsigned long seg;
+       unsigned seg;
        struct iovec *iov;

        iov = iovec_from_user(uvec, nr_segs, fast_segs, *iovp, compat);


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 12:42                               ` David Hildenbrand
@ 2020-10-22 12:57                                 ` Greg KH
  2020-10-22 13:50                                   ` Greg KH
  0 siblings, 1 reply; 171+ messages in thread
From: Greg KH @ 2020-10-22 12:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: David Laight, Al Viro, Nick Desaulniers, Christoph Hellwig,
	kernel-team, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Thu, Oct 22, 2020 at 02:42:24PM +0200, David Hildenbrand wrote:
> On 22.10.20 14:18, Greg KH wrote:
> > On Thu, Oct 22, 2020 at 12:48:05PM +0200, Greg KH wrote:
> >> On Thu, Oct 22, 2020 at 11:36:40AM +0200, David Hildenbrand wrote:
> >>> On 22.10.20 11:32, David Laight wrote:
> >>>> From: David Hildenbrand
> >>>>> Sent: 22 October 2020 10:25
> >>>> ...
> >>>>> ... especially because I recall that clang and gcc behave slightly
> >>>>> differently:
> >>>>>
> >>>>> https://github.com/hjl-tools/x86-psABI/issues/2
> >>>>>
> >>>>> "Function args are different: narrow types are sign or zero extended to
> >>>>> 32 bits, depending on their type. clang depends on this for incoming
> >>>>> args, but gcc doesn't make that assumption. But both compilers do it
> >>>>> when calling, so gcc code can call clang code.
> >>>>
> >>>> It really is best to use 'int' (or even 'long') for all numeric
> >>>> arguments (and results) regardless of the domain of the value.
> >>>>
> >>>> Related, I've always worried about 'bool'....
> >>>>
> >>>>> The upper 32 bits of registers are always undefined garbage for types
> >>>>> smaller than 64 bits."
> >>>>
> >>>> On x86-64 the high bits are zeroed by all 32bit loads.
> >>>
> >>> Yeah, but does not help here.
> >>>
> >>>
> >>> My thinking: if the compiler that calls import_iovec() has garbage in
> >>> the upper 32 bit
> >>>
> >>> a) gcc will zero it out and not rely on it being zero.
> >>> b) clang will not zero it out, assuming it is zero.
> >>>
> >>> But
> >>>
> >>> a) will zero it out when calling the !inlined variant
> >>> b) clang will zero it out when calling the !inlined variant
> >>>
> >>> When inlining, b) strikes. We access garbage. That would mean that we
> >>> have calling code that's not generated by clang/gcc IIUC.
> >>>
> >>> We can test easily by changing the parameters instead of adding an "inline".
> >>
> >> Let me try that as well, as I seem to have a good reproducer, but it
> >> takes a while to run...
> > 
> > Ok, that didn't work.
> > 
> > And I can't seem to "fix" this by adding noinline to patches further
> > along in the patch series (because this commit's function is no longer
> > present due to later patches.)
> 
> We might have the same issues with iovec_from_user() and friends now.
> 
> > 
> > Will keep digging...
> > 
> > greg k-h
> > 
> 
> 
> Might be worth to give this a try, just to see if it's related to
> garbage in upper 32 bit and the way clang is handling it (might be a BUG
> in clang, though):
> 
> 
> diff --git a/include/linux/uio.h b/include/linux/uio.h
> index 72d88566694e..7527298c6b56 100644
> --- a/include/linux/uio.h
> +++ b/include/linux/uio.h
> @@ -267,7 +267,7 @@ size_t hash_and_copy_to_iter(const void *addr,
> size_t bytes, void *hashp,
>                 struct iov_iter *i);
> 
>  struct iovec *iovec_from_user(const struct iovec __user *uvector,
> -               unsigned long nr_segs, unsigned long fast_segs,
> +               unsigned nr_segs, unsigned fast_segs,
>                 struct iovec *fast_iov, bool compat);
>  ssize_t import_iovec(int type, const struct iovec __user *uvec,
>                  unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 1635111c5bd2..58417f1916dc 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1652,7 +1652,7 @@ const void *dup_iter(struct iov_iter *new, struct
> iov_iter *old, gfp_t flags)
>  EXPORT_SYMBOL(dup_iter);
> 
>  static int copy_compat_iovec_from_user(struct iovec *iov,
> -               const struct iovec __user *uvec, unsigned long nr_segs)
> +               const struct iovec __user *uvec, unsigned nr_segs)
>  {
>         const struct compat_iovec __user *uiov =
>                 (const struct compat_iovec __user *)uvec;
> @@ -1684,7 +1684,7 @@ static int copy_compat_iovec_from_user(struct
> iovec *iov,
>  }
> 
>  static int copy_iovec_from_user(struct iovec *iov,
> -               const struct iovec __user *uvec, unsigned long nr_segs)
> +               const struct iovec __user *uvec, unsigned nr_segs)
>  {
>         unsigned long seg;
> 
> @@ -1699,7 +1699,7 @@ static int copy_iovec_from_user(struct iovec *iov,
>  }
> 
>  struct iovec *iovec_from_user(const struct iovec __user *uvec,
> -               unsigned long nr_segs, unsigned long fast_segs,
> +               unsigned nr_segs, unsigned fast_segs,
>                 struct iovec *fast_iov, bool compat)
>  {
>         struct iovec *iov = fast_iov;
> @@ -1738,7 +1738,7 @@ ssize_t __import_iovec(int type, const struct
> iovec __user *uvec,
>                  struct iov_iter *i, bool compat)
>  {
>         ssize_t total_len = 0;
> -       unsigned long seg;
> +       unsigned seg;
>         struct iovec *iov;
> 
>         iov = iovec_from_user(uvec, nr_segs, fast_segs, *iovp, compat);
> 

Ah, I tested the other way around, making everything "unsigned long"
instead.  Will go try this too, as other tests are still running...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22  9:36                         ` David Hildenbrand
  2020-10-22 10:48                           ` Greg KH
@ 2020-10-22 13:23                           ` Christoph Hellwig
  2020-10-22 16:35                             ` David Laight
  1 sibling, 1 reply; 171+ messages in thread
From: Christoph Hellwig @ 2020-10-22 13:23 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: David Laight, Greg KH, Al Viro, Nick Desaulniers,
	Christoph Hellwig, kernel-team, Andrew Morton, Jens Axboe,
	Arnd Bergmann, David Howells, linux-arm-kernel, linux-kernel,
	linux-mips, linux-parisc, linuxppc-dev, linux-s390, sparclinux,
	linux-block, linux-scsi, linux-fsdevel, linux-aio, io-uring,
	linux-arch, linux-mm, netdev, keyrings, linux-security-module

On Thu, Oct 22, 2020 at 11:36:40AM +0200, David Hildenbrand wrote:
> My thinking: if the compiler that calls import_iovec() has garbage in
> the upper 32 bit
> 
> a) gcc will zero it out and not rely on it being zero.
> b) clang will not zero it out, assuming it is zero.
> 
> But
> 
> a) will zero it out when calling the !inlined variant
> b) clang will zero it out when calling the !inlined variant
> 
> When inlining, b) strikes. We access garbage. That would mean that we
> have calling code that's not generated by clang/gcc IIUC.

Most callchains of import_iovec start with the assembly syscall wrappers.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 12:57                                 ` Greg KH
@ 2020-10-22 13:50                                   ` Greg KH
       [not found]                                     ` <CAK8P3a1B7OVdyzW0-97JwzZiwp0D0fnSfyete16QTvPp_1m07A@mail.gmail.com>
  2020-10-23 12:46                                     ` David Laight
  0 siblings, 2 replies; 171+ messages in thread
From: Greg KH @ 2020-10-22 13:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: David Laight, Al Viro, Nick Desaulniers, Christoph Hellwig,
	kernel-team, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Thu, Oct 22, 2020 at 02:57:59PM +0200, Greg KH wrote:
> On Thu, Oct 22, 2020 at 02:42:24PM +0200, David Hildenbrand wrote:
> > On 22.10.20 14:18, Greg KH wrote:
> > > On Thu, Oct 22, 2020 at 12:48:05PM +0200, Greg KH wrote:
> > >> On Thu, Oct 22, 2020 at 11:36:40AM +0200, David Hildenbrand wrote:
> > >>> On 22.10.20 11:32, David Laight wrote:
> > >>>> From: David Hildenbrand
> > >>>>> Sent: 22 October 2020 10:25
> > >>>> ...
> > >>>>> ... especially because I recall that clang and gcc behave slightly
> > >>>>> differently:
> > >>>>>
> > >>>>> https://github.com/hjl-tools/x86-psABI/issues/2
> > >>>>>
> > >>>>> "Function args are different: narrow types are sign or zero extended to
> > >>>>> 32 bits, depending on their type. clang depends on this for incoming
> > >>>>> args, but gcc doesn't make that assumption. But both compilers do it
> > >>>>> when calling, so gcc code can call clang code.
> > >>>>
> > >>>> It really is best to use 'int' (or even 'long') for all numeric
> > >>>> arguments (and results) regardless of the domain of the value.
> > >>>>
> > >>>> Related, I've always worried about 'bool'....
> > >>>>
> > >>>>> The upper 32 bits of registers are always undefined garbage for types
> > >>>>> smaller than 64 bits."
> > >>>>
> > >>>> On x86-64 the high bits are zeroed by all 32bit loads.
> > >>>
> > >>> Yeah, but does not help here.
> > >>>
> > >>>
> > >>> My thinking: if the compiler that calls import_iovec() has garbage in
> > >>> the upper 32 bit
> > >>>
> > >>> a) gcc will zero it out and not rely on it being zero.
> > >>> b) clang will not zero it out, assuming it is zero.
> > >>>
> > >>> But
> > >>>
> > >>> a) will zero it out when calling the !inlined variant
> > >>> b) clang will zero it out when calling the !inlined variant
> > >>>
> > >>> When inlining, b) strikes. We access garbage. That would mean that we
> > >>> have calling code that's not generated by clang/gcc IIUC.
> > >>>
> > >>> We can test easily by changing the parameters instead of adding an "inline".
> > >>
> > >> Let me try that as well, as I seem to have a good reproducer, but it
> > >> takes a while to run...
> > > 
> > > Ok, that didn't work.
> > > 
> > > And I can't seem to "fix" this by adding noinline to patches further
> > > along in the patch series (because this commit's function is no longer
> > > present due to later patches.)
> > 
> > We might have the same issues with iovec_from_user() and friends now.
> > 
> > > 
> > > Will keep digging...
> > > 
> > > greg k-h
> > > 
> > 
> > 
> > Might be worth to give this a try, just to see if it's related to
> > garbage in upper 32 bit and the way clang is handling it (might be a BUG
> > in clang, though):
> > 
> > 
> > diff --git a/include/linux/uio.h b/include/linux/uio.h
> > index 72d88566694e..7527298c6b56 100644
> > --- a/include/linux/uio.h
> > +++ b/include/linux/uio.h
> > @@ -267,7 +267,7 @@ size_t hash_and_copy_to_iter(const void *addr,
> > size_t bytes, void *hashp,
> >                 struct iov_iter *i);
> > 
> >  struct iovec *iovec_from_user(const struct iovec __user *uvector,
> > -               unsigned long nr_segs, unsigned long fast_segs,
> > +               unsigned nr_segs, unsigned fast_segs,
> >                 struct iovec *fast_iov, bool compat);
> >  ssize_t import_iovec(int type, const struct iovec __user *uvec,
> >                  unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
> > diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> > index 1635111c5bd2..58417f1916dc 100644
> > --- a/lib/iov_iter.c
> > +++ b/lib/iov_iter.c
> > @@ -1652,7 +1652,7 @@ const void *dup_iter(struct iov_iter *new, struct
> > iov_iter *old, gfp_t flags)
> >  EXPORT_SYMBOL(dup_iter);
> > 
> >  static int copy_compat_iovec_from_user(struct iovec *iov,
> > -               const struct iovec __user *uvec, unsigned long nr_segs)
> > +               const struct iovec __user *uvec, unsigned nr_segs)
> >  {
> >         const struct compat_iovec __user *uiov =
> >                 (const struct compat_iovec __user *)uvec;
> > @@ -1684,7 +1684,7 @@ static int copy_compat_iovec_from_user(struct
> > iovec *iov,
> >  }
> > 
> >  static int copy_iovec_from_user(struct iovec *iov,
> > -               const struct iovec __user *uvec, unsigned long nr_segs)
> > +               const struct iovec __user *uvec, unsigned nr_segs)
> >  {
> >         unsigned long seg;
> > 
> > @@ -1699,7 +1699,7 @@ static int copy_iovec_from_user(struct iovec *iov,
> >  }
> > 
> >  struct iovec *iovec_from_user(const struct iovec __user *uvec,
> > -               unsigned long nr_segs, unsigned long fast_segs,
> > +               unsigned nr_segs, unsigned fast_segs,
> >                 struct iovec *fast_iov, bool compat)
> >  {
> >         struct iovec *iov = fast_iov;
> > @@ -1738,7 +1738,7 @@ ssize_t __import_iovec(int type, const struct
> > iovec __user *uvec,
> >                  struct iov_iter *i, bool compat)
> >  {
> >         ssize_t total_len = 0;
> > -       unsigned long seg;
> > +       unsigned seg;
> >         struct iovec *iov;
> > 
> >         iov = iovec_from_user(uvec, nr_segs, fast_segs, *iovp, compat);
> > 
> 
> Ah, I tested the other way around, making everything "unsigned long"
> instead.  Will go try this too, as other tests are still running...

Ok, no, this didn't work either.

Nick, I think I need some compiler help here.  Any ideas?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
       [not found]                                     ` <CAK8P3a1B7OVdyzW0-97JwzZiwp0D0fnSfyete16QTvPp_1m07A@mail.gmail.com>
@ 2020-10-22 14:40                                       ` Greg KH
  2020-10-22 16:15                                         ` David Laight
  0 siblings, 1 reply; 171+ messages in thread
From: Greg KH @ 2020-10-22 14:40 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: David Hildenbrand, David Laight, Al Viro, Nick Desaulniers,
	Christoph Hellwig, kernel-team, Andrew Morton, Jens Axboe,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Thu, Oct 22, 2020 at 04:28:20PM +0200, Arnd Bergmann wrote:
> On Thu, Oct 22, 2020 at 3:50 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> > On Thu, Oct 22, 2020 at 02:57:59PM +0200, Greg KH wrote:
> > > On Thu, Oct 22, 2020 at 02:42:24PM +0200, David Hildenbrand wrote:
> 
> > > >  struct iovec *iovec_from_user(const struct iovec __user *uvec,
> > > > -               unsigned long nr_segs, unsigned long fast_segs,
> > > > +               unsigned nr_segs, unsigned fast_segs,
> > > >                 struct iovec *fast_iov, bool compat)
> > > >  {
> > > >         struct iovec *iov = fast_iov;
> > > > @@ -1738,7 +1738,7 @@ ssize_t __import_iovec(int type, const struct
> > > > iovec __user *uvec,
> > > >                  struct iov_iter *i, bool compat)
> > > >  {
> > > >         ssize_t total_len = 0;
> > > > -       unsigned long seg;
> > > > +       unsigned seg;
> > > >         struct iovec *iov;
> > > >
> > > >         iov = iovec_from_user(uvec, nr_segs, fast_segs, *iovp, compat);
> > > >
> > >
> > > Ah, I tested the other way around, making everything "unsigned long"
> > > instead.  Will go try this too, as other tests are still running...
> >
> > Ok, no, this didn't work either.
> >
> > Nick, I think I need some compiler help here.  Any ideas?
> 
> I don't think the patch above would reliably clear the upper bits if they
> contain garbage.
> 
> If the integer extension is the problem, the way I'd try it is to make the
> function take an 'unsigned long' and then explictly mask the upper
> bits with
> 
>      seg = lower_32_bits(seg);
> 
> Can you attach the iov_iter.s files from the broken build, plus the
> one with 'noinline' for comparison? Maybe something can be seen
> in there.

I don't know how to extract the .s files easily from the AOSP build
system, I'll look into that.  I'm also now testing by downgrading to an
older version of clang (10 instead of 11), to see if that matters at all
or not...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 14:40                                       ` Greg KH
@ 2020-10-22 16:15                                         ` David Laight
  0 siblings, 0 replies; 171+ messages in thread
From: David Laight @ 2020-10-22 16:15 UTC (permalink / raw)
  To: 'Greg KH', Arnd Bergmann
  Cc: David Hildenbrand, Al Viro, Nick Desaulniers, Christoph Hellwig,
	kernel-team, Andrew Morton, Jens Axboe, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

From: Greg KH
> Sent: 22 October 2020 15:40
> 
> On Thu, Oct 22, 2020 at 04:28:20PM +0200, Arnd Bergmann wrote:
...
> > Can you attach the iov_iter.s files from the broken build, plus the
> > one with 'noinline' for comparison? Maybe something can be seen
> > in there.
> 
> I don't know how to extract the .s files easily from the AOSP build
> system, I'll look into that.  I'm also now testing by downgrading to an
> older version of clang (10 instead of 11), to see if that matters at all
> or not...

Back from a day out - after it stopped raining.
Trying to use up leave before the end of the year.

Can you use objdump on the kernel binary itself and cut out
the single function?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 13:23                           ` Christoph Hellwig
@ 2020-10-22 16:35                             ` David Laight
  2020-10-22 16:40                               ` Matthew Wilcox
  2020-10-22 17:54                               ` Nick Desaulniers
  0 siblings, 2 replies; 171+ messages in thread
From: David Laight @ 2020-10-22 16:35 UTC (permalink / raw)
  To: 'Christoph Hellwig', David Hildenbrand
  Cc: Greg KH, Al Viro, Nick Desaulniers, kernel-team, Andrew Morton,
	Jens Axboe, Arnd Bergmann, David Howells, linux-arm-kernel,
	linux-kernel, linux-mips, linux-parisc, linuxppc-dev, linux-s390,
	sparclinux, linux-block, linux-scsi, linux-fsdevel, linux-aio,
	io-uring, linux-arch, linux-mm, netdev, keyrings,
	linux-security-module

From: Christoph Hellwig
> Sent: 22 October 2020 14:24
> 
> On Thu, Oct 22, 2020 at 11:36:40AM +0200, David Hildenbrand wrote:
> > My thinking: if the compiler that calls import_iovec() has garbage in
> > the upper 32 bit
> >
> > a) gcc will zero it out and not rely on it being zero.
> > b) clang will not zero it out, assuming it is zero.
> >
> > But
> >
> > a) will zero it out when calling the !inlined variant
> > b) clang will zero it out when calling the !inlined variant
> >
> > When inlining, b) strikes. We access garbage. That would mean that we
> > have calling code that's not generated by clang/gcc IIUC.
> 
> Most callchains of import_iovec start with the assembly syscall wrappers.

Wait...
readv(2) defines:
	ssize_t readv(int fd, const struct iovec *iov, int iovcnt);

But the syscall is defined as:

SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
                unsigned long, vlen)
{
        return do_readv(fd, vec, vlen, 0);
}

I'm guessing that nothing actually masks the high bits that come
from an application that is compiled with clang?

The vlen is 'unsigned long' through the first few calls.
So unless there is a non-inlined function than takes vlen
as 'int' the high garbage bits from userspace are kept.

Which makes it a bug in the kernel C syscall wrappers.
They need to explicitly mask the high bits of 32bit
arguments on arm64 but not x86-64.

What does the ARM EABI say about register parameters?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 16:35                             ` David Laight
@ 2020-10-22 16:40                               ` Matthew Wilcox
  2020-10-22 16:50                                 ` David Laight
                                                   ` (2 more replies)
  2020-10-22 17:54                               ` Nick Desaulniers
  1 sibling, 3 replies; 171+ messages in thread
From: Matthew Wilcox @ 2020-10-22 16:40 UTC (permalink / raw)
  To: David Laight
  Cc: 'Christoph Hellwig',
	David Hildenbrand, Greg KH, Al Viro, Nick Desaulniers,
	kernel-team, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Thu, Oct 22, 2020 at 04:35:17PM +0000, David Laight wrote:
> Wait...
> readv(2) defines:
> 	ssize_t readv(int fd, const struct iovec *iov, int iovcnt);

It doesn't really matter what the manpage says.  What does the AOSP
libc header say?

> But the syscall is defined as:
> 
> SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
>                 unsigned long, vlen)
> {
>         return do_readv(fd, vec, vlen, 0);
> }


^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 16:40                               ` Matthew Wilcox
@ 2020-10-22 16:50                                 ` David Laight
  2020-10-22 17:00                                 ` Nick Desaulniers
  2020-10-22 18:19                                 ` Al Viro
  2 siblings, 0 replies; 171+ messages in thread
From: David Laight @ 2020-10-22 16:50 UTC (permalink / raw)
  To: 'Matthew Wilcox'
  Cc: 'Christoph Hellwig',
	David Hildenbrand, Greg KH, Al Viro, Nick Desaulniers,
	kernel-team, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

From: Matthew Wilcox <willy@infradead.org>
> Sent: 22 October 2020 17:41
> 
> On Thu, Oct 22, 2020 at 04:35:17PM +0000, David Laight wrote:
> > Wait...
> > readv(2) defines:
> > 	ssize_t readv(int fd, const struct iovec *iov, int iovcnt);
> 
> It doesn't really matter what the manpage says.  What does the AOSP
> libc header say?

The only copy I can find is:

/usr/include/x86_64-linux-gnu/sys/uio.h:extern ssize_t readv (int __fd, const struct iovec *__iovec, int __count)
/usr/include/x86_64-linux-gnu/sys/uio.h-  __wur;

and not surprisingly agrees.
POSIX and/or TOG will (more or less) mandate the prototype.

> > But the syscall is defined as:
> >
> > SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
> >                 unsigned long, vlen)
> > {
> >         return do_readv(fd, vec, vlen, 0);
> > }

I wonder when the high bits of 'fd' get zapped?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 16:40                               ` Matthew Wilcox
  2020-10-22 16:50                                 ` David Laight
@ 2020-10-22 17:00                                 ` Nick Desaulniers
  2020-10-22 20:59                                   ` Eric Biggers
  2020-10-22 18:19                                 ` Al Viro
  2 siblings, 1 reply; 171+ messages in thread
From: Nick Desaulniers @ 2020-10-22 17:00 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: David Laight, Christoph Hellwig, David Hildenbrand, Greg KH,
	Al Viro, kernel-team, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

On Thu, Oct 22, 2020 at 9:40 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Oct 22, 2020 at 04:35:17PM +0000, David Laight wrote:
> > Wait...
> > readv(2) defines:
> >       ssize_t readv(int fd, const struct iovec *iov, int iovcnt);
>
> It doesn't really matter what the manpage says.  What does the AOSP
> libc header say?

Same: https://android.googlesource.com/platform/bionic/+/refs/heads/master/libc/include/sys/uio.h#38

Theoretically someone could bypass libc to make a system call, right?

>
> > But the syscall is defined as:
> >
> > SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
> >                 unsigned long, vlen)
> > {
> >         return do_readv(fd, vec, vlen, 0);
> > }
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 16:35                             ` David Laight
  2020-10-22 16:40                               ` Matthew Wilcox
@ 2020-10-22 17:54                               ` Nick Desaulniers
       [not found]                                 ` <CAK8P3a3LjG+ZvmQrkb9zpgov8xBkQQWrkHBPgjfYSqBKGrwT4w@mail.gmail.com>
  1 sibling, 1 reply; 171+ messages in thread
From: Nick Desaulniers @ 2020-10-22 17:54 UTC (permalink / raw)
  To: David Laight
  Cc: Christoph Hellwig, David Hildenbrand, Greg KH, Al Viro,
	kernel-team, Andrew Morton, Jens Axboe, Arnd Bergmann,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module


[-- Attachment #1: Type: text/plain, Size: 2086 bytes --]

On Thu, Oct 22, 2020 at 9:35 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Christoph Hellwig
> > Sent: 22 October 2020 14:24
> >
> > On Thu, Oct 22, 2020 at 11:36:40AM +0200, David Hildenbrand wrote:
> > > My thinking: if the compiler that calls import_iovec() has garbage in
> > > the upper 32 bit
> > >
> > > a) gcc will zero it out and not rely on it being zero.
> > > b) clang will not zero it out, assuming it is zero.
> > >
> > > But
> > >
> > > a) will zero it out when calling the !inlined variant
> > > b) clang will zero it out when calling the !inlined variant
> > >
> > > When inlining, b) strikes. We access garbage. That would mean that we
> > > have calling code that's not generated by clang/gcc IIUC.
> >
> > Most callchains of import_iovec start with the assembly syscall wrappers.
>
> Wait...
> readv(2) defines:
>         ssize_t readv(int fd, const struct iovec *iov, int iovcnt);
>
> But the syscall is defined as:
>
> SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
>                 unsigned long, vlen)
> {
>         return do_readv(fd, vec, vlen, 0);
> }
>
> I'm guessing that nothing actually masks the high bits that come
> from an application that is compiled with clang?
>
> The vlen is 'unsigned long' through the first few calls.
> So unless there is a non-inlined function than takes vlen
> as 'int' the high garbage bits from userspace are kept.

Yeah, that's likely a bug: https://godbolt.org/z/KfsPKs

>
> Which makes it a bug in the kernel C syscall wrappers.
> They need to explicitly mask the high bits of 32bit
> arguments on arm64 but not x86-64.

Why not x86-64? Wouldn't it be *any* LP64 ISA?

Attaching a patch that uses the proper width, but I'm pretty sure
there's still a signedness issue .  Greg, would you mind running this
through the wringer?

>
> What does the ARM EABI say about register parameters?

AAPCS is the ABI for 64b ARM, IIUC, which is the ISA GKH is reporting
the problem against. IIUC, EABI is one of the 32b ABIs.  aarch64 is
LP64 just like x86_64.

--
Thanks,
~Nick Desaulniers

[-- Attachment #2: 0001-fs-fix-up-type-confusion-in-readv-writev.patch --]
[-- Type: application/octet-stream, Size: 4630 bytes --]

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 16:40                               ` Matthew Wilcox
  2020-10-22 16:50                                 ` David Laight
  2020-10-22 17:00                                 ` Nick Desaulniers
@ 2020-10-22 18:19                                 ` Al Viro
  2 siblings, 0 replies; 171+ messages in thread
From: Al Viro @ 2020-10-22 18:19 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: David Laight, 'Christoph Hellwig',
	David Hildenbrand, Greg KH, Nick Desaulniers, kernel-team,
	Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

On Thu, Oct 22, 2020 at 05:40:40PM +0100, Matthew Wilcox wrote:
> On Thu, Oct 22, 2020 at 04:35:17PM +0000, David Laight wrote:
> > Wait...
> > readv(2) defines:
> > 	ssize_t readv(int fd, const struct iovec *iov, int iovcnt);
> 
> It doesn't really matter what the manpage says.  What does the AOSP
> libc header say?

FWIW, see https://www.spinics.net/lists/linux-scsi/msg147836.html and
subthread from there on...

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
       [not found]                                 ` <CAK8P3a3LjG+ZvmQrkb9zpgov8xBkQQWrkHBPgjfYSqBKGrwT4w@mail.gmail.com>
@ 2020-10-22 19:04                                   ` Nick Desaulniers
  2020-10-22 19:24                                     ` Al Viro
  2020-10-22 22:04                                     ` David Laight
  0 siblings, 2 replies; 171+ messages in thread
From: Nick Desaulniers @ 2020-10-22 19:04 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: David Laight, Christoph Hellwig, David Hildenbrand, Greg KH,
	Al Viro, kernel-team, Andrew Morton, Jens Axboe, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

On Thu, Oct 22, 2020 at 11:13 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Thu, Oct 22, 2020 at 7:54 PM Nick Desaulniers
> <ndesaulniers@google.com> wrote:
> > On Thu, Oct 22, 2020 at 9:35 AM David Laight <David.Laight@aculab.com> wrote:
> > >
> > > Which makes it a bug in the kernel C syscall wrappers.
> > > They need to explicitly mask the high bits of 32bit
> > > arguments on arm64 but not x86-64.
> >
> > Why not x86-64? Wouldn't it be *any* LP64 ISA?
>
> x86-64 is slightly special because most instructions on a 32-bit
> argument clear the upper 32 bits, while on most architectures
> the same instruction would leave the upper bits unchanged.

Oh interesting, depends on the operations too on x86_64 IIUC?

>
> > Attaching a patch that uses the proper width, but I'm pretty sure
> > there's still a signedness issue .  Greg, would you mind running this
> > through the wringer?
>
> I would not expect this to change anything for the bug that Greg
> is chasing, unless there is also a bug in clang.
>
> In the version before the patch, we get a 64-bit argument from
> user space, which may consist of the intended value in the lower
> bits plus garbage in the upper bits. However, vlen only gets
> passed down  into import_iovec() without any other operations
> on it, and since import_iovec takes a 32-bit argument, this is
> where it finally gets narrowed.

Passing an `unsigned long` as an `unsigned int` does no such
narrowing: https://godbolt.org/z/TvfMxe (same vice-versa, just tail
calls, no masking instructions).
So if rw_copy_check_uvector() is inlined into import_iovec() (looking
at the mainline@1028ae406999), then children calls of
`rw_copy_check_uvector()` will be interpreting the `nr_segs` register
unmodified, ie. garbage in the upper 32b.

>
> After your patch, the SYSCALL_DEFINE3() does the narrowing
> conversion with the same clearing of the upper bits.
>
> If there is a problem somewhere leading up to import_iovec(),
> it would have to in some code that expects to get a 32-bit
> register argument but gets called with a register that has
> garbage in the upper bits /without/ going through a correct
> sanitizing function like SYSCALL_DEFINE3().
>
>       Arnd



-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 19:04                                   ` Nick Desaulniers
@ 2020-10-22 19:24                                     ` Al Viro
  2020-10-22 19:27                                       ` Al Viro
                                                         ` (4 more replies)
  2020-10-22 22:04                                     ` David Laight
  1 sibling, 5 replies; 171+ messages in thread
From: Al Viro @ 2020-10-22 19:24 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Arnd Bergmann, David Laight, Christoph Hellwig,
	David Hildenbrand, Greg KH, kernel-team, Andrew Morton,
	Jens Axboe, David Howells, linux-arm-kernel, linux-kernel,
	linux-mips, linux-parisc, linuxppc-dev, linux-s390, sparclinux,
	linux-block, linux-scsi, linux-fsdevel, linux-aio, io-uring,
	linux-arch, linux-mm, netdev, keyrings, linux-security-module

On Thu, Oct 22, 2020 at 12:04:52PM -0700, Nick Desaulniers wrote:

> Passing an `unsigned long` as an `unsigned int` does no such
> narrowing: https://godbolt.org/z/TvfMxe (same vice-versa, just tail
> calls, no masking instructions).
> So if rw_copy_check_uvector() is inlined into import_iovec() (looking
> at the mainline@1028ae406999), then children calls of
> `rw_copy_check_uvector()` will be interpreting the `nr_segs` register
> unmodified, ie. garbage in the upper 32b.

FWIW,

void f(unsinged long v)
{
	if (v != 1)
		printf("failed\n");
}

void g(unsigned int v)
{
	f(v);
}

void h(unsigned long v)
{
	g(v);
}

main()
{
	h(0x100000001);
}

must not produce any output on a host with 32bit int and 64bit long, regardless of
the inlining, having functions live in different compilation units, etc.

Depending upon the calling conventions, compiler might do truncation in caller or
in a callee, but it must be done _somewhere_.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 19:24                                     ` Al Viro
@ 2020-10-22 19:27                                       ` Al Viro
  2020-10-22 20:06                                       ` Al Viro
                                                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 171+ messages in thread
From: Al Viro @ 2020-10-22 19:27 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Arnd Bergmann, David Laight, Christoph Hellwig,
	David Hildenbrand, Greg KH, kernel-team, Andrew Morton,
	Jens Axboe, David Howells, linux-arm-kernel, linux-kernel,
	linux-mips, linux-parisc, linuxppc-dev, linux-s390, sparclinux,
	linux-block, linux-scsi, linux-fsdevel, linux-aio, io-uring,
	linux-arch, linux-mm, netdev, keyrings, linux-security-module

On Thu, Oct 22, 2020 at 08:24:58PM +0100, Al Viro wrote:
> On Thu, Oct 22, 2020 at 12:04:52PM -0700, Nick Desaulniers wrote:
> 
> > Passing an `unsigned long` as an `unsigned int` does no such
> > narrowing: https://godbolt.org/z/TvfMxe (same vice-versa, just tail
> > calls, no masking instructions).
> > So if rw_copy_check_uvector() is inlined into import_iovec() (looking
> > at the mainline@1028ae406999), then children calls of
> > `rw_copy_check_uvector()` will be interpreting the `nr_segs` register
> > unmodified, ie. garbage in the upper 32b.
> 
> FWIW,
> 
> void f(unsinged long v)

s/unsinged/unsigned/, obviously...

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 19:24                                     ` Al Viro
  2020-10-22 19:27                                       ` Al Viro
@ 2020-10-22 20:06                                       ` Al Viro
  2020-10-22 20:09                                         ` Al Viro
  2020-10-22 20:11                                       ` Nick Desaulniers
                                                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 171+ messages in thread
From: Al Viro @ 2020-10-22 20:06 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Arnd Bergmann, David Laight, Christoph Hellwig,
	David Hildenbrand, Greg KH, kernel-team, Andrew Morton,
	Jens Axboe, David Howells, linux-arm-kernel, linux-kernel,
	linux-mips, linux-parisc, linuxppc-dev, linux-s390, sparclinux,
	linux-block, linux-scsi, linux-fsdevel, linux-aio, io-uring,
	linux-arch, linux-mm, netdev, keyrings, linux-security-module

On Thu, Oct 22, 2020 at 08:24:58PM +0100, Al Viro wrote:

> Depending upon the calling conventions, compiler might do truncation in caller or
> in a callee, but it must be done _somewhere_.

Unless I'm misreading AAPCS64,
	"Unlike in the 32-bit AAPCS, named integral values must be narrowed by the callee
	 rather than the caller"
in 6.4.2 means that callee must not _not_ expect the upper 32 bits of %x0..%x7 to contain
anything valid for 32bit arguments and it must zero-extend %w0..%w7 when passing that to
something that expects a 64bit argument.  On inlining it should be the same situation as
storing unsigned int argument into unsigned long local variable and working with that - if

void f(unsigned int w)
{
	unsigned long x = w;
	printf("%lx\n", x);
}

ends up passing %x0 to printf, it's an obvious bug - it must do something like
	uxtw x0, w0
first.

What am I missing here?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 20:06                                       ` Al Viro
@ 2020-10-22 20:09                                         ` Al Viro
  0 siblings, 0 replies; 171+ messages in thread
From: Al Viro @ 2020-10-22 20:09 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Arnd Bergmann, David Laight, Christoph Hellwig,
	David Hildenbrand, Greg KH, kernel-team, Andrew Morton,
	Jens Axboe, David Howells, linux-arm-kernel, linux-kernel,
	linux-mips, linux-parisc, linuxppc-dev, linux-s390, sparclinux,
	linux-block, linux-scsi, linux-fsdevel, linux-aio, io-uring,
	linux-arch, linux-mm, netdev, keyrings, linux-security-module

On Thu, Oct 22, 2020 at 09:06:29PM +0100, Al Viro wrote:
> On Thu, Oct 22, 2020 at 08:24:58PM +0100, Al Viro wrote:
> 
> > Depending upon the calling conventions, compiler might do truncation in caller or
> > in a callee, but it must be done _somewhere_.
> 
> Unless I'm misreading AAPCS64,
> 	"Unlike in the 32-bit AAPCS, named integral values must be narrowed by the callee
> 	 rather than the caller"
> in 6.4.2 means that callee must not _not_ expect the upper 32 bits of %x0..%x7 to contain

Sorry, artefact of editing - that's

"in 6.4.2 means that callee must _not_ expect the upper 32 bits of %x0..%x7 to contain"

obviously.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 19:24                                     ` Al Viro
  2020-10-22 19:27                                       ` Al Viro
  2020-10-22 20:06                                       ` Al Viro
@ 2020-10-22 20:11                                       ` Nick Desaulniers
  2020-10-22 22:07                                       ` David Laight
  2020-10-23 13:12                                       ` David Hildenbrand
  4 siblings, 0 replies; 171+ messages in thread
From: Nick Desaulniers @ 2020-10-22 20:11 UTC (permalink / raw)
  To: Al Viro
  Cc: Arnd Bergmann, David Laight, Christoph Hellwig,
	David Hildenbrand, Greg KH, kernel-team, Andrew Morton,
	Jens Axboe, David Howells, linux-arm-kernel, linux-kernel,
	linux-mips, linux-parisc, linuxppc-dev, linux-s390, sparclinux,
	linux-block, linux-scsi, linux-fsdevel, linux-aio, io-uring,
	linux-arch, linux-mm, netdev, keyrings, linux-security-module

On Thu, Oct 22, 2020 at 12:25 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Thu, Oct 22, 2020 at 12:04:52PM -0700, Nick Desaulniers wrote:
>
> > Passing an `unsigned long` as an `unsigned int` does no such
> > narrowing: https://godbolt.org/z/TvfMxe (same vice-versa, just tail
> > calls, no masking instructions).
> > So if rw_copy_check_uvector() is inlined into import_iovec() (looking
> > at the mainline@1028ae406999), then children calls of
> > `rw_copy_check_uvector()` will be interpreting the `nr_segs` register
> > unmodified, ie. garbage in the upper 32b.
>
> FWIW,
>
> void f(unsinged long v)
> {
>         if (v != 1)
>                 printf("failed\n");
> }
>
> void g(unsigned int v)
> {
>         f(v);
> }
>
> void h(unsigned long v)
> {
>         g(v);
> }
>
> main()
> {
>         h(0x100000001);
> }

A good/analogous example, but things get weird when the leaf node in
the call chain is inline asm: https://godbolt.org/z/s19TY5

(I'm not sure that's precisely what's going on here; I'll need to dive
more into the calls rw_copy_check_uvector() makes to see if there's
inline asm somewhere, pretty sure calls to get_user with `nr_regs`
exist).

>
> must not produce any output on a host with 32bit int and 64bit long, regardless of
> the inlining, having functions live in different compilation units, etc.
>
> Depending upon the calling conventions, compiler might do truncation in caller or
> in a callee, but it must be done _somewhere_.



-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 17:00                                 ` Nick Desaulniers
@ 2020-10-22 20:59                                   ` Eric Biggers
  2020-10-22 21:28                                     ` Al Viro
  0 siblings, 1 reply; 171+ messages in thread
From: Eric Biggers @ 2020-10-22 20:59 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Matthew Wilcox, David Laight, Christoph Hellwig,
	David Hildenbrand, Greg KH, Al Viro, kernel-team, Andrew Morton,
	Jens Axboe, Arnd Bergmann, David Howells, linux-arm-kernel,
	linux-kernel, linux-mips, linux-parisc, linuxppc-dev, linux-s390,
	sparclinux, linux-block, linux-scsi, linux-fsdevel, linux-aio,
	io-uring, linux-arch, linux-mm, netdev, keyrings,
	linux-security-module

On Thu, Oct 22, 2020 at 10:00:44AM -0700, Nick Desaulniers wrote:
> On Thu, Oct 22, 2020 at 9:40 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Thu, Oct 22, 2020 at 04:35:17PM +0000, David Laight wrote:
> > > Wait...
> > > readv(2) defines:
> > >       ssize_t readv(int fd, const struct iovec *iov, int iovcnt);
> >
> > It doesn't really matter what the manpage says.  What does the AOSP
> > libc header say?
> 
> Same: https://android.googlesource.com/platform/bionic/+/refs/heads/master/libc/include/sys/uio.h#38
> 
> Theoretically someone could bypass libc to make a system call, right?
> 
> >
> > > But the syscall is defined as:
> > >
> > > SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
> > >                 unsigned long, vlen)
> > > {
> > >         return do_readv(fd, vec, vlen, 0);
> > > }
> >
> 

FWIW, glibc makes the readv() syscall assuming that fd and vlen are 'int' as
well.  So this problem isn't specific to Android's libc.

From objdump -d /lib/x86_64-linux-gnu/libc.so.6:

	00000000000f4db0 <readv@@GLIBC_2.2.5>:
	   f4db0:       64 8b 04 25 18 00 00    mov    %fs:0x18,%eax
	   f4db7:       00
	   f4db8:       85 c0                   test   %eax,%eax
	   f4dba:       75 14                   jne    f4dd0 <readv@@GLIBC_2.2.5+0x20>
	   f4dbc:       b8 13 00 00 00          mov    $0x13,%eax
	   f4dc1:       0f 05                   syscall
	   ...

There's some code for pthread cancellation, but no zeroing of the upper half of
the fd and vlen arguments, which are in %edi and %edx respectively.  But the
glibc function prototype uses 'int' for them, not 'unsigned long'
'ssize_t readv(int fd, const struct iovec *iov, int iovcnt);'.

So the high halves of the fd and iovcnt registers can contain garbage.  Or at
least that's what gcc (9.3.0) and clang (9.0.1) assume; they both compile the
following

void g(unsigned int x);

void f(unsigned long x)
{
        g(x);
}

into f() making a tail call to g(), without zeroing the top half of %rdi.

Also note the following program succeeds on Linux 5.9 on x86_64.  On kernels
that have this bug, it should fail.  (I couldn't get it to actually fail, so it
must depend on the compiler and/or the kernel config...)

	#include <fcntl.h>
	#include <stdio.h>
	#include <sys/syscall.h>
	#include <sys/uio.h>
	#include <unistd.h>

	int main()
	{
		int fd = open("/dev/zero", O_RDONLY);
		char buf[1000];
		struct iovec iov = { .iov_base = buf, .iov_len = sizeof(buf) };
		long ret;

		ret = syscall(__NR_readv, fd, &iov, 0x100000001);
		if (ret < 0)
			perror("readv failed");
		else
			printf("read %ld bytes\n", ret);
	}

I think the right fix is to change the readv() (and writev(), etc.) syscalls to
take 'unsigned int' rather than 'unsigned long', as that is what the users are
assuming...

- Eric

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 20:59                                   ` Eric Biggers
@ 2020-10-22 21:28                                     ` Al Viro
  0 siblings, 0 replies; 171+ messages in thread
From: Al Viro @ 2020-10-22 21:28 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Linus Torvalds, Nick Desaulniers, Matthew Wilcox, David Laight,
	Christoph Hellwig, David Hildenbrand, Greg KH, kernel-team,
	Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

On Thu, Oct 22, 2020 at 01:59:32PM -0700, Eric Biggers wrote:

> Also note the following program succeeds on Linux 5.9 on x86_64.  On kernels
> that have this bug, it should fail.  (I couldn't get it to actually fail, so it
> must depend on the compiler and/or the kernel config...)

It doesn't.  See https://www.spinics.net/lists/linux-scsi/msg147836.html for
discussion of that mess.

ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
                  unsigned long vlen, loff_t *pos, rwf_t flags)
{
        struct iovec iovstack[UIO_FASTIOV];
        struct iovec *iov = iovstack;
        struct iov_iter iter;
        ssize_t ret;

        ret = import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
        if (ret >= 0) {
                ret = do_iter_read(file, &iter, pos, flags);
                kfree(iov);
        }

        return ret;
}

and import_iovec() takes unsigned int as the third argument, so it *will*
truncate to 32 bits, no matter what.  Has done so since 0504c074b546
"switch {compat_,}do_readv_writev() to {compat_,}import_iovec()" back in
March 2015.  Yes, it was an incompatible userland ABI change, even though
nothing that used glibc/uclibc/dietlibc would've noticed.

Better yet, up until 2.1.90pre1 passing a 64bit value as the _first_ argument
of readv(2) used to fail with -EBADF if it was too large; at that point it
started to get quietly truncated to 32bit first.  And again, no libc users
would've noticed (neither would anything except deliberate regression test
looking for that specific behaviour).

Note that we also have process_madvise(2) with size_t for vlen (huh?  It's
a number of array elements, not an object size) and process_vm_{read,write}v(2),
that have unsigned long for the same thing.  And the last two *are* using
the same unsigned long from glibc POV.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 19:04                                   ` Nick Desaulniers
  2020-10-22 19:24                                     ` Al Viro
@ 2020-10-22 22:04                                     ` David Laight
  1 sibling, 0 replies; 171+ messages in thread
From: David Laight @ 2020-10-22 22:04 UTC (permalink / raw)
  To: 'Nick Desaulniers', Arnd Bergmann
  Cc: Christoph Hellwig, David Hildenbrand, Greg KH, Al Viro,
	kernel-team, Andrew Morton, Jens Axboe, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

From: Nick Desaulniers
> Sent: 22 October 2020 20:05
> 
...
> Passing an `unsigned long` as an `unsigned int` does no such
> narrowing: https://godbolt.org/z/TvfMxe (same vice-versa, just tail
> calls, no masking instructions).

Right but is the called function going to use 32bit ops
and/or mask the register?
Certainly that is likely on x86-64.

I've rather lost track of where the masking is expected
to happen.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 19:24                                     ` Al Viro
                                                         ` (2 preceding siblings ...)
  2020-10-22 20:11                                       ` Nick Desaulniers
@ 2020-10-22 22:07                                       ` David Laight
  2020-10-23 13:12                                       ` David Hildenbrand
  4 siblings, 0 replies; 171+ messages in thread
From: David Laight @ 2020-10-22 22:07 UTC (permalink / raw)
  To: 'Al Viro', Nick Desaulniers
  Cc: Arnd Bergmann, Christoph Hellwig, David Hildenbrand, Greg KH,
	kernel-team, Andrew Morton, Jens Axboe, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

From: Al Viro
> Sent: 22 October 2020 20:25
> 
> On Thu, Oct 22, 2020 at 12:04:52PM -0700, Nick Desaulniers wrote:
> 
> > Passing an `unsigned long` as an `unsigned int` does no such
> > narrowing: https://godbolt.org/z/TvfMxe (same vice-versa, just tail
> > calls, no masking instructions).
> > So if rw_copy_check_uvector() is inlined into import_iovec() (looking
> > at the mainline@1028ae406999), then children calls of
> > `rw_copy_check_uvector()` will be interpreting the `nr_segs` register
> > unmodified, ie. garbage in the upper 32b.
> 
> FWIW,
> 
> void f(unsinged long v)
> {
> 	if (v != 1)
> 		printf("failed\n");
> }
> 
> void g(unsigned int v)
> {
> 	f(v);
> }
> 
> void h(unsigned long v)
> {
> 	g(v);
> }
> 
> main()
> {
> 	h(0x100000001);
> }
> 
> must not produce any output on a host with 32bit int and 64bit long, regardless of
> the inlining, having functions live in different compilation units, etc.
> 
> Depending upon the calling conventions, compiler might do truncation in caller or
> in a callee, but it must be done _somewhere_.

Put g() in a separate compilation unit and use the 'wrong' type
in the prototypes t() used to call g() and g() uses to call f().

Then you might see where and masking does (or does not) happen.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 13:50                                   ` Greg KH
       [not found]                                     ` <CAK8P3a1B7OVdyzW0-97JwzZiwp0D0fnSfyete16QTvPp_1m07A@mail.gmail.com>
@ 2020-10-23 12:46                                     ` David Laight
  2020-10-23 13:09                                       ` David Hildenbrand
       [not found]                                       ` <CAK8P3a1n+b8hOMhNQSDzgic03dyXbmpccfTJ3C1bGKvzsgMXbg@mail.gmail.com>
  1 sibling, 2 replies; 171+ messages in thread
From: David Laight @ 2020-10-23 12:46 UTC (permalink / raw)
  To: 'Greg KH', David Hildenbrand
  Cc: Al Viro, Nick Desaulniers, Christoph Hellwig, kernel-team,
	Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

From: Greg KH <gregkh@linuxfoundation.org>
> Sent: 22 October 2020 14:51

I've rammed the code into godbolt.

https://godbolt.org/z/9v5PPW

Definitely a clang bug.

Search for [wx]24 in the clang output.
nr_segs comes in as w2 and the initial bound checks are done on w2.
w24 is loaded from w2 - I don't believe this changes the high bits.
There are no references to w24, just x24.
So the kmalloc_array() is passed 'huge' and will fail.
The iov_iter_init also gets the 64bit value.

Note that the gcc code has a sign-extend copy of w2.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-23 12:46                                     ` David Laight
@ 2020-10-23 13:09                                       ` David Hildenbrand
  2020-10-23 14:33                                         ` David Hildenbrand
  2020-10-23 17:58                                         ` Al Viro
       [not found]                                       ` <CAK8P3a1n+b8hOMhNQSDzgic03dyXbmpccfTJ3C1bGKvzsgMXbg@mail.gmail.com>
  1 sibling, 2 replies; 171+ messages in thread
From: David Hildenbrand @ 2020-10-23 13:09 UTC (permalink / raw)
  To: David Laight, 'Greg KH'
  Cc: Al Viro, Nick Desaulniers, Christoph Hellwig, kernel-team,
	Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

On 23.10.20 14:46, David Laight wrote:
> From: Greg KH <gregkh@linuxfoundation.org>
>> Sent: 22 October 2020 14:51
> 
> I've rammed the code into godbolt.
> 
> https://godbolt.org/z/9v5PPW
> 
> Definitely a clang bug.
> 
> Search for [wx]24 in the clang output.
> nr_segs comes in as w2 and the initial bound checks are done on w2.
> w24 is loaded from w2 - I don't believe this changes the high bits.
> There are no references to w24, just x24.
> So the kmalloc_array() is passed 'huge' and will fail.
> The iov_iter_init also gets the 64bit value.
> 
> Note that the gcc code has a sign-extend copy of w2.

Do we have a result from using "unsigned long" in the base function and
explicitly masking of the high bits? That should definitely work.

Now, I am not a compiler expert, but as I already cited, at least on
x86-64 clang expects that the high bits were cleared by the caller - in
contrast to gcc. I suspect it's the same on arm64, but again, I am no
compiler expert.

If what I said and cites for x86-64 is correct, if the function expects
an "unsigned int", it will happily use 64bit operations without further
checks where valid when assuming high bits are zero. That's why even
converting everything to "unsigned int" as proposed by me won't work on
clang - it assumes high bits are zero (as indicated by Nick).

As I am neither a compiler experts (did I mention that already? ;) ) nor
an arm64 experts, I can't tell if this is a compiler BUG or not.

Main issue seems to be garbage in high bits as originally suggested by me.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-22 19:24                                     ` Al Viro
                                                         ` (3 preceding siblings ...)
  2020-10-22 22:07                                       ` David Laight
@ 2020-10-23 13:12                                       ` David Hildenbrand
  4 siblings, 0 replies; 171+ messages in thread
From: David Hildenbrand @ 2020-10-23 13:12 UTC (permalink / raw)
  To: Al Viro, Nick Desaulniers
  Cc: Arnd Bergmann, David Laight, Christoph Hellwig, Greg KH,
	kernel-team, Andrew Morton, Jens Axboe, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

On 22.10.20 21:24, Al Viro wrote:
> On Thu, Oct 22, 2020 at 12:04:52PM -0700, Nick Desaulniers wrote:
> 
>> Passing an `unsigned long` as an `unsigned int` does no such
>> narrowing: https://godbolt.org/z/TvfMxe (same vice-versa, just tail
>> calls, no masking instructions).
>> So if rw_copy_check_uvector() is inlined into import_iovec() (looking
>> at the mainline@1028ae406999), then children calls of
>> `rw_copy_check_uvector()` will be interpreting the `nr_segs` register
>> unmodified, ie. garbage in the upper 32b.
> 
> FWIW,
> 
> void f(unsinged long v)
> {
> 	if (v != 1)
> 		printf("failed\n");
> }
> 
> void g(unsigned int v)
> {
> 	f(v);
> }
> 
> void h(unsigned long v)
> {
> 	g(v);
> }
> 
> main()
> {
> 	h(0x100000001);
> }
> 
> must not produce any output on a host with 32bit int and 64bit long, regardless of
> the inlining, having functions live in different compilation units, etc.
> 
> Depending upon the calling conventions, compiler might do truncation in caller or
> in a callee, but it must be done _somewhere_.

The interesting case is having g() in a separate compilation unit and
force-calling g() with 0x100000001 via inline ASM. So forcing garbage
into high bits.

I'll paly with it.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
       [not found]                                       ` <CAK8P3a1n+b8hOMhNQSDzgic03dyXbmpccfTJ3C1bGKvzsgMXbg@mail.gmail.com>
@ 2020-10-23 13:28                                         ` David Laight
  0 siblings, 0 replies; 171+ messages in thread
From: David Laight @ 2020-10-23 13:28 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: Greg KH, David Hildenbrand, Al Viro, Nick Desaulniers,
	Christoph Hellwig, kernel-team, Andrew Morton, Jens Axboe,
	David Howells, linux-arm-kernel, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, linux-s390, sparclinux, linux-block,
	linux-scsi, linux-fsdevel, linux-aio, io-uring, linux-arch,
	linux-mm, netdev, keyrings, linux-security-module

From: Arnd Bergmann
> Sent: 23 October 2020 14:23
> 
> On Fri, Oct 23, 2020 at 2:46 PM David Laight <David.Laight@aculab.com> wrote:
> >
> > From: Greg KH <gregkh@linuxfoundation.org>
> > > Sent: 22 October 2020 14:51
> >
> > I've rammed the code into godbolt.
> >
> > https://godbolt.org/z/9v5PPW
> >
> > Definitely a clang bug.
> >
> > Search for [wx]24 in the clang output.
> > nr_segs comes in as w2 and the initial bound checks are done on w2.
> > w24 is loaded from w2 - I don't believe this changes the high bits.
> 
> You believe wrong, "mov w24, w2" is a zero-extending operation.

Ah ok, but gcc uses utxw for the same task.
I guess they could be the same opcode.

Last time I wrote ARM thumb didn't really exist - never mind 64bit

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-23 13:09                                       ` David Hildenbrand
@ 2020-10-23 14:33                                         ` David Hildenbrand
  2020-10-23 14:39                                           ` David Laight
  2020-10-23 17:58                                         ` Al Viro
  1 sibling, 1 reply; 171+ messages in thread
From: David Hildenbrand @ 2020-10-23 14:33 UTC (permalink / raw)
  To: David Laight, 'Greg KH'
  Cc: Al Viro, Nick Desaulniers, Christoph Hellwig, kernel-team,
	Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

On 23.10.20 15:09, David Hildenbrand wrote:
> On 23.10.20 14:46, David Laight wrote:
>> From: Greg KH <gregkh@linuxfoundation.org>
>>> Sent: 22 October 2020 14:51
>>
>> I've rammed the code into godbolt.
>>
>> https://godbolt.org/z/9v5PPW
>>
>> Definitely a clang bug.
>>
>> Search for [wx]24 in the clang output.
>> nr_segs comes in as w2 and the initial bound checks are done on w2.
>> w24 is loaded from w2 - I don't believe this changes the high bits.
>> There are no references to w24, just x24.
>> So the kmalloc_array() is passed 'huge' and will fail.
>> The iov_iter_init also gets the 64bit value.
>>
>> Note that the gcc code has a sign-extend copy of w2.
> 
> Do we have a result from using "unsigned long" in the base function and
> explicitly masking of the high bits? That should definitely work.
> 
> Now, I am not a compiler expert, but as I already cited, at least on
> x86-64 clang expects that the high bits were cleared by the caller - in
> contrast to gcc. I suspect it's the same on arm64, but again, I am no
> compiler expert.
> 
> If what I said and cites for x86-64 is correct, if the function expects
> an "unsigned int", it will happily use 64bit operations without further
> checks where valid when assuming high bits are zero. That's why even
> converting everything to "unsigned int" as proposed by me won't work on
> clang - it assumes high bits are zero (as indicated by Nick).
> 
> As I am neither a compiler experts (did I mention that already? ;) ) nor
> an arm64 experts, I can't tell if this is a compiler BUG or not.
> 

I just checked against upstream code generated by clang 10 and it
properly discards the upper 32bit via a mov w23 w2.

So at least clang 10 indeed properly assumes we could have garbage and
masks it off.

Maybe the issue is somewhere else, unrelated to nr_pages ... or clang 11
behaves differently.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-23 14:33                                         ` David Hildenbrand
@ 2020-10-23 14:39                                           ` David Laight
  2020-10-23 14:47                                             ` 'Greg KH'
  0 siblings, 1 reply; 171+ messages in thread
From: David Laight @ 2020-10-23 14:39 UTC (permalink / raw)
  To: 'David Hildenbrand', 'Greg KH'
  Cc: Al Viro, Nick Desaulniers, Christoph Hellwig, kernel-team,
	Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

From: David Hildenbrand
> Sent: 23 October 2020 15:33
...
> I just checked against upstream code generated by clang 10 and it
> properly discards the upper 32bit via a mov w23 w2.
> 
> So at least clang 10 indeed properly assumes we could have garbage and
> masks it off.
> 
> Maybe the issue is somewhere else, unrelated to nr_pages ... or clang 11
> behaves differently.

We'll need the disassembly from a failing kernel image.
It isn't that big to hand annotate.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-23 14:39                                           ` David Laight
@ 2020-10-23 14:47                                             ` 'Greg KH'
  2020-10-23 16:33                                               ` David Hildenbrand
  2020-11-02  9:06                                               ` David Laight
  0 siblings, 2 replies; 171+ messages in thread
From: 'Greg KH' @ 2020-10-23 14:47 UTC (permalink / raw)
  To: David Laight
  Cc: 'David Hildenbrand',
	Al Viro, Nick Desaulniers, Christoph Hellwig, kernel-team,
	Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

On Fri, Oct 23, 2020 at 02:39:24PM +0000, David Laight wrote:
> From: David Hildenbrand
> > Sent: 23 October 2020 15:33
> ...
> > I just checked against upstream code generated by clang 10 and it
> > properly discards the upper 32bit via a mov w23 w2.
> > 
> > So at least clang 10 indeed properly assumes we could have garbage and
> > masks it off.
> > 
> > Maybe the issue is somewhere else, unrelated to nr_pages ... or clang 11
> > behaves differently.
> 
> We'll need the disassembly from a failing kernel image.
> It isn't that big to hand annotate.

I've worked around the merge at the moment in the android tree, but it
is still quite reproducable, and will try to get a .o file to
disassemble on Monday or so...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-23 14:47                                             ` 'Greg KH'
@ 2020-10-23 16:33                                               ` David Hildenbrand
  2020-11-02  9:06                                               ` David Laight
  1 sibling, 0 replies; 171+ messages in thread
From: David Hildenbrand @ 2020-10-23 16:33 UTC (permalink / raw)
  To: 'Greg KH', David Laight
  Cc: Al Viro, Nick Desaulniers, Christoph Hellwig, kernel-team,
	Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

On 23.10.20 16:47, 'Greg KH' wrote:
> On Fri, Oct 23, 2020 at 02:39:24PM +0000, David Laight wrote:
>> From: David Hildenbrand
>>> Sent: 23 October 2020 15:33
>> ...
>>> I just checked against upstream code generated by clang 10 and it
>>> properly discards the upper 32bit via a mov w23 w2.
>>>
>>> So at least clang 10 indeed properly assumes we could have garbage and
>>> masks it off.
>>>
>>> Maybe the issue is somewhere else, unrelated to nr_pages ... or clang 11
>>> behaves differently.
>>
>> We'll need the disassembly from a failing kernel image.
>> It isn't that big to hand annotate.
> 
> I've worked around the merge at the moment in the android tree, but it
> is still quite reproducable, and will try to get a .o file to
> disassemble on Monday or so...

I just compiled pre and post fb041b598997d63c0f7d7305dfae70046bf66fe1 with

clang version 11.0.0 (Fedora 11.0.0-0.2.rc1.fc33)

for aarch64 with defconfig and extracted import_iovec and
rw_copy_check_uvector (skipping the compat things)

Pre fb041b598997d63c0f7d7305dfae70046bf66fe1 import_iovec
-> https://pastebin.com/LtnYMLJt
Post fb041b598997d63c0f7d7305dfae70046bf66fe1 import_iovec
-> https://pastebin.com/BWPmXrAf
Pre fb041b598997d63c0f7d7305dfae70046bf66fe1 rw_copy_check_uvector
-> https://pastebin.com/4nSBYRbf
Post fb041b598997d63c0f7d7305dfae70046bf66fe1 rw_copy_check_uvector
-> https://pastebin.com/hPtEgaEW

I'm only able to spot minor differences ... less gets inlined than I
would have expected. But there are some smaller differences.

Maybe someone wants to have a look before we have object files as used
by Greg ...

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-23 13:09                                       ` David Hildenbrand
  2020-10-23 14:33                                         ` David Hildenbrand
@ 2020-10-23 17:58                                         ` Al Viro
  2020-10-23 18:27                                           ` Segher Boessenkool
  1 sibling, 1 reply; 171+ messages in thread
From: Al Viro @ 2020-10-23 17:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: David Laight, 'Greg KH',
	Nick Desaulniers, Christoph Hellwig, kernel-team, Andrew Morton,
	Jens Axboe, Arnd Bergmann, David Howells, linux-arm-kernel,
	linux-kernel, linux-mips, linux-parisc, linuxppc-dev, linux-s390,
	sparclinux, linux-block, linux-scsi, linux-fsdevel, linux-aio,
	io-uring, linux-arch, linux-mm, netdev, keyrings,
	linux-security-module

On Fri, Oct 23, 2020 at 03:09:30PM +0200, David Hildenbrand wrote:

> Now, I am not a compiler expert, but as I already cited, at least on
> x86-64 clang expects that the high bits were cleared by the caller - in
> contrast to gcc. I suspect it's the same on arm64, but again, I am no
> compiler expert.
> 
> If what I said and cites for x86-64 is correct, if the function expects
> an "unsigned int", it will happily use 64bit operations without further
> checks where valid when assuming high bits are zero. That's why even
> converting everything to "unsigned int" as proposed by me won't work on
> clang - it assumes high bits are zero (as indicated by Nick).
> 
> As I am neither a compiler experts (did I mention that already? ;) ) nor
> an arm64 experts, I can't tell if this is a compiler BUG or not.

On arm64 when callee expects a 32bit argument, the caller is *not* responsible
for clearing the upper half of 64bit register used to pass the value - it only
needs to store the actual value into the lower half.  The callee must consider
the contents of the upper half of that register as undefined.  See AAPCS64 (e.g.
https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#parameter-passing-rules
); AFAICS, the relevant bit is
	"Unlike in the 32-bit AAPCS, named integral values must be narrowed by
the callee rather than the caller."

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-23 17:58                                         ` Al Viro
@ 2020-10-23 18:27                                           ` Segher Boessenkool
  2020-10-23 21:28                                             ` David Laight
  0 siblings, 1 reply; 171+ messages in thread
From: Segher Boessenkool @ 2020-10-23 18:27 UTC (permalink / raw)
  To: Al Viro
  Cc: David Hildenbrand, linux-aio, linux-mips, David Howells,
	linux-mm, keyrings, sparclinux, Christoph Hellwig, linux-arch,
	linux-s390, linux-scsi, kernel-team, Arnd Bergmann, linux-block,
	io-uring, linux-arm-kernel, Jens Axboe, linux-parisc,
	'Greg KH',
	Nick Desaulniers, linux-kernel, linux-security-module,
	David Laight, netdev, linux-fsdevel, Andrew Morton, linuxppc-dev

On Fri, Oct 23, 2020 at 06:58:57PM +0100, Al Viro wrote:
> On Fri, Oct 23, 2020 at 03:09:30PM +0200, David Hildenbrand wrote:
> 
> > Now, I am not a compiler expert, but as I already cited, at least on
> > x86-64 clang expects that the high bits were cleared by the caller - in
> > contrast to gcc. I suspect it's the same on arm64, but again, I am no
> > compiler expert.
> > 
> > If what I said and cites for x86-64 is correct, if the function expects
> > an "unsigned int", it will happily use 64bit operations without further
> > checks where valid when assuming high bits are zero. That's why even
> > converting everything to "unsigned int" as proposed by me won't work on
> > clang - it assumes high bits are zero (as indicated by Nick).
> > 
> > As I am neither a compiler experts (did I mention that already? ;) ) nor
> > an arm64 experts, I can't tell if this is a compiler BUG or not.
> 
> On arm64 when callee expects a 32bit argument, the caller is *not* responsible
> for clearing the upper half of 64bit register used to pass the value - it only
> needs to store the actual value into the lower half.  The callee must consider
> the contents of the upper half of that register as undefined.  See AAPCS64 (e.g.
> https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#parameter-passing-rules
> ); AFAICS, the relevant bit is
> 	"Unlike in the 32-bit AAPCS, named integral values must be narrowed by
> the callee rather than the caller."

Or the formal rule:

C.9 	If the argument is an Integral or Pointer Type, the size of the
	argument is less than or equal to 8 bytes and the NGRN is less
	than 8, the argument is copied to the least significant bits in
	x[NGRN]. The NGRN is incremented by one. The argument has now
	been allocated.


Segher

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-23 18:27                                           ` Segher Boessenkool
@ 2020-10-23 21:28                                             ` David Laight
  2020-10-24 17:29                                               ` Segher Boessenkool
  0 siblings, 1 reply; 171+ messages in thread
From: David Laight @ 2020-10-23 21:28 UTC (permalink / raw)
  To: 'Segher Boessenkool', Al Viro
  Cc: David Hildenbrand, linux-aio, linux-mips, David Howells,
	linux-mm, keyrings, sparclinux, Christoph Hellwig, linux-arch,
	linux-s390, linux-scsi, kernel-team, Arnd Bergmann, linux-block,
	io-uring, linux-arm-kernel, Jens Axboe, linux-parisc,
	'Greg KH',
	Nick Desaulniers, linux-kernel, linux-security-module, netdev,
	linux-fsdevel, Andrew Morton, linuxppc-dev

From: Segher Boessenkool
> Sent: 23 October 2020 19:27
> 
> On Fri, Oct 23, 2020 at 06:58:57PM +0100, Al Viro wrote:
> > On Fri, Oct 23, 2020 at 03:09:30PM +0200, David Hildenbrand wrote:
> >
> > > Now, I am not a compiler expert, but as I already cited, at least on
> > > x86-64 clang expects that the high bits were cleared by the caller - in
> > > contrast to gcc. I suspect it's the same on arm64, but again, I am no
> > > compiler expert.
> > >
> > > If what I said and cites for x86-64 is correct, if the function expects
> > > an "unsigned int", it will happily use 64bit operations without further
> > > checks where valid when assuming high bits are zero. That's why even
> > > converting everything to "unsigned int" as proposed by me won't work on
> > > clang - it assumes high bits are zero (as indicated by Nick).
> > >
> > > As I am neither a compiler experts (did I mention that already? ;) ) nor
> > > an arm64 experts, I can't tell if this is a compiler BUG or not.
> >
> > On arm64 when callee expects a 32bit argument, the caller is *not* responsible
> > for clearing the upper half of 64bit register used to pass the value - it only
> > needs to store the actual value into the lower half.  The callee must consider
> > the contents of the upper half of that register as undefined.  See AAPCS64 (e.g.
> > https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#parameter-passing-rules
> > ); AFAICS, the relevant bit is
> > 	"Unlike in the 32-bit AAPCS, named integral values must be narrowed by
> > the callee rather than the caller."
> 
> Or the formal rule:
> 
> C.9 	If the argument is an Integral or Pointer Type, the size of the
> 	argument is less than or equal to 8 bytes and the NGRN is less
> 	than 8, the argument is copied to the least significant bits in
> 	x[NGRN]. The NGRN is incremented by one. The argument has now
> 	been allocated.

So, in essence, if the value is in a 64bit register the calling
code is independent of the actual type of the formal parameter.
Clearly a value might need explicit widening.

I've found a copy of the 64 bit arm instruction set.
Unfortunately it is alpha sorted and repetitive so shows none
of the symmetry and makes things difficult to find.
But, contrary to what someone suggested most register writes
(eg from arithmetic) seem to zero/extend the high bits.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-23 21:28                                             ` David Laight
@ 2020-10-24 17:29                                               ` Segher Boessenkool
  2020-10-24 21:12                                                 ` David Laight
  0 siblings, 1 reply; 171+ messages in thread
From: Segher Boessenkool @ 2020-10-24 17:29 UTC (permalink / raw)
  To: David Laight
  Cc: Al Viro, David Hildenbrand, linux-aio, linux-mips, David Howells,
	linux-mm, keyrings, sparclinux, Christoph Hellwig, linux-arch,
	linux-s390, linux-scsi, kernel-team, Arnd Bergmann, linux-block,
	io-uring, linux-arm-kernel, Jens Axboe, linux-parisc,
	'Greg KH',
	Nick Desaulniers, linux-kernel, linux-security-module, netdev,
	linux-fsdevel, Andrew Morton, linuxppc-dev

On Fri, Oct 23, 2020 at 09:28:59PM +0000, David Laight wrote:
> From: Segher Boessenkool
> > Sent: 23 October 2020 19:27
> > On Fri, Oct 23, 2020 at 06:58:57PM +0100, Al Viro wrote:
> > > On Fri, Oct 23, 2020 at 03:09:30PM +0200, David Hildenbrand wrote:
> > > On arm64 when callee expects a 32bit argument, the caller is *not* responsible
> > > for clearing the upper half of 64bit register used to pass the value - it only
> > > needs to store the actual value into the lower half.  The callee must consider
> > > the contents of the upper half of that register as undefined.  See AAPCS64 (e.g.
> > > https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#parameter-passing-rules
> > > ); AFAICS, the relevant bit is
> > > 	"Unlike in the 32-bit AAPCS, named integral values must be narrowed by
> > > the callee rather than the caller."
> > 
> > Or the formal rule:
> > 
> > C.9 	If the argument is an Integral or Pointer Type, the size of the
> > 	argument is less than or equal to 8 bytes and the NGRN is less
> > 	than 8, the argument is copied to the least significant bits in
> > 	x[NGRN]. The NGRN is incremented by one. The argument has now
> > 	been allocated.
> 
> So, in essence, if the value is in a 64bit register the calling
> code is independent of the actual type of the formal parameter.
> Clearly a value might need explicit widening.

No, this says that if you pass a 32-bit integer in a 64-bit register,
then the top 32 bits of that register hold an undefined value.

> I've found a copy of the 64 bit arm instruction set.
> Unfortunately it is alpha sorted and repetitive so shows none
> of the symmetry and makes things difficult to find.

All of this is ABI, not ISA.  Look at the AAPCS64 pointed to above.

> But, contrary to what someone suggested most register writes
> (eg from arithmetic) seem to zero/extend the high bits.

Everything that writes a "w" does, yes.  But that has nothing to do with
the parameter passing rules, that is ABI.  It just means that very often
a 32-bit integer will be passed zero-extended in a 64-bit register, but
that is just luck (or not, it makes finding bugs harder ;-) )


Segher

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-24 17:29                                               ` Segher Boessenkool
@ 2020-10-24 21:12                                                 ` David Laight
  0 siblings, 0 replies; 171+ messages in thread
From: David Laight @ 2020-10-24 21:12 UTC (permalink / raw)
  To: 'Segher Boessenkool'
  Cc: Al Viro, David Hildenbrand, linux-aio, linux-mips, David Howells,
	linux-mm, keyrings, sparclinux, Christoph Hellwig, linux-arch,
	linux-s390, linux-scsi, kernel-team, Arnd Bergmann, linux-block,
	io-uring, linux-arm-kernel, Jens Axboe, linux-parisc,
	'Greg KH',
	Nick Desaulniers, linux-kernel, linux-security-module, netdev,
	linux-fsdevel, Andrew Morton, linuxppc-dev

From: Segher Boessenkool
> Sent: 24 October 2020 18:29
> 
> On Fri, Oct 23, 2020 at 09:28:59PM +0000, David Laight wrote:
> > From: Segher Boessenkool
> > > Sent: 23 October 2020 19:27
> > > On Fri, Oct 23, 2020 at 06:58:57PM +0100, Al Viro wrote:
> > > > On Fri, Oct 23, 2020 at 03:09:30PM +0200, David Hildenbrand wrote:
> > > > On arm64 when callee expects a 32bit argument, the caller is *not* responsible
> > > > for clearing the upper half of 64bit register used to pass the value - it only
> > > > needs to store the actual value into the lower half.  The callee must consider
> > > > the contents of the upper half of that register as undefined.  See AAPCS64 (e.g.
> > > > https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#parameter-passing-rules
> > > > ); AFAICS, the relevant bit is
> > > > 	"Unlike in the 32-bit AAPCS, named integral values must be narrowed by
> > > > the callee rather than the caller."
> > >
> > > Or the formal rule:
> > >
> > > C.9 	If the argument is an Integral or Pointer Type, the size of the
> > > 	argument is less than or equal to 8 bytes and the NGRN is less
> > > 	than 8, the argument is copied to the least significant bits in
> > > 	x[NGRN]. The NGRN is incremented by one. The argument has now
> > > 	been allocated.
> >
> > So, in essence, if the value is in a 64bit register the calling
> > code is independent of the actual type of the formal parameter.
> > Clearly a value might need explicit widening.
> 
> No, this says that if you pass a 32-bit integer in a 64-bit register,
> then the top 32 bits of that register hold an undefined value.

That's sort of what I meant.
The 'normal' junk in the hight bits will there because the variable
in the calling code is wider.

> > I've found a copy of the 64 bit arm instruction set.
> > Unfortunately it is alpha sorted and repetitive so shows none
> > of the symmetry and makes things difficult to find.
> 
> All of this is ABI, not ISA.  Look at the AAPCS64 pointed to above.
> 
> > But, contrary to what someone suggested most register writes
> > (eg from arithmetic) seem to zero/extend the high bits.
> 
> Everything that writes a "w" does, yes.  But that has nothing to do with
> the parameter passing rules, that is ABI.  It just means that very often
> a 32-bit integer will be passed zero-extended in a 64-bit register, but
> that is just luck (or not, it makes finding bugs harder ;-) )

Working out why the code is wrong is more of an ISA issue than an ABI one.
It may be an ABI one, but the analysis is ISA.

I've written a lot of asm over the years - decoding compiler generated
asm isn't that hard.
At least ARM doesn't have annulled delay slots.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-10-23 14:47                                             ` 'Greg KH'
  2020-10-23 16:33                                               ` David Hildenbrand
@ 2020-11-02  9:06                                               ` David Laight
  2020-11-02 13:52                                                 ` 'Greg KH'
  1 sibling, 1 reply; 171+ messages in thread
From: David Laight @ 2020-11-02  9:06 UTC (permalink / raw)
  To: 'Greg KH'
  Cc: 'David Hildenbrand',
	Al Viro, Nick Desaulniers, Christoph Hellwig, kernel-team,
	Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

From: 'Greg KH'
> Sent: 23 October 2020 15:47
> 
> On Fri, Oct 23, 2020 at 02:39:24PM +0000, David Laight wrote:
> > From: David Hildenbrand
> > > Sent: 23 October 2020 15:33
> > ...
> > > I just checked against upstream code generated by clang 10 and it
> > > properly discards the upper 32bit via a mov w23 w2.
> > >
> > > So at least clang 10 indeed properly assumes we could have garbage and
> > > masks it off.
> > >
> > > Maybe the issue is somewhere else, unrelated to nr_pages ... or clang 11
> > > behaves differently.
> >
> > We'll need the disassembly from a failing kernel image.
> > It isn't that big to hand annotate.
> 
> I've worked around the merge at the moment in the android tree, but it
> is still quite reproducable, and will try to get a .o file to
> disassemble on Monday or so...

Did this get properly resolved?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
  2020-11-02  9:06                                               ` David Laight
@ 2020-11-02 13:52                                                 ` 'Greg KH'
  2020-11-02 18:23                                                   ` David Laight
  0 siblings, 1 reply; 171+ messages in thread
From: 'Greg KH' @ 2020-11-02 13:52 UTC (permalink / raw)
  To: David Laight
  Cc: 'David Hildenbrand',
	Al Viro, Nick Desaulniers, Christoph Hellwig, kernel-team,
	Andrew Morton, Jens Axboe, Arnd Bergmann, David Howells,
	linux-arm-kernel, linux-kernel, linux-mips, linux-parisc,
	linuxppc-dev, linux-s390, sparclinux, linux-block, linux-scsi,
	linux-fsdevel, linux-aio, io-uring, linux-arch, linux-mm, netdev,
	keyrings, linux-security-module

On Mon, Nov 02, 2020 at 09:06:38AM +0000, David Laight wrote:
> From: 'Greg KH'
> > Sent: 23 October 2020 15:47
> > 
> > On Fri, Oct 23, 2020 at 02:39:24PM +0000, David Laight wrote:
> > > From: David Hildenbrand
> > > > Sent: 23 October 2020 15:33
> > > ...
> > > > I just checked against upstream code generated by clang 10 and it
> > > > properly discards the upper 32bit via a mov w23 w2.
> > > >
> > > > So at least clang 10 indeed properly assumes we could have garbage and
> > > > masks it off.
> > > >
> > > > Maybe the issue is somewhere else, unrelated to nr_pages ... or clang 11
> > > > behaves differently.
> > >
> > > We'll need the disassembly from a failing kernel image.
> > > It isn't that big to hand annotate.
> > 
> > I've worked around the merge at the moment in the android tree, but it
> > is still quite reproducable, and will try to get a .o file to
> > disassemble on Monday or so...
> 
> Did this get properly resolved?

For some reason, 5.10-rc2 fixed all of this up.  I backed out all of the
patches I had to revert to get 5.10-rc1 to work properly, and then did
the merge and all is well.

It must have been something to do with the compat changes in this same
area that went in after 5.10-rc1, and something got reorganized in the
files somehow.  I really do not know, and at the moment, don't have the
time to track it down anymore.  So for now, I'd say it's all good, sorry
for the noise.

greg k-h

^ permalink