linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/5] Add preadv & pwritev system calls.
@ 2009-01-16 16:45 Gerd Hoffmann
  2009-01-16 16:45 ` [PATCH 1/5] create compat_readv() Gerd Hoffmann
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Gerd Hoffmann @ 2009-01-16 16:45 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-api; +Cc: aarcange, Gerd Hoffmann

  Hi folks,

Next round of the preadv & pwritev patch series.  Had no review comments
to fix.  That means it is ready to be merged, right?

Changes:
 - compat_sys_{read,write}v bugfix patch dropped (merged).
 - rebase to latest git, adapt to CVE-2009-0029 changes.

How to proceed now?  Is there a syscall maintainer where I could queue
up the patches?  If not, anyone (akpm?) willng to pick this up?  Should
I try to send to Linus directly?

What is the usual way to handle the arch-specific syscall windup?  I'd
prefer to leave that to the arch maintainers as they know best what
needs to be done, is that ok?  Right now only x86 (/me) and mips (patch
from Ralf Baechle) is covered ...

cheers,
  Gerd


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] create compat_readv()
  2009-01-16 16:45 [PATCH v6 0/5] Add preadv & pwritev system calls Gerd Hoffmann
@ 2009-01-16 16:45 ` Gerd Hoffmann
  2009-01-16 16:45 ` [PATCH 2/5] create compat_writev() Gerd Hoffmann
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Gerd Hoffmann @ 2009-01-16 16:45 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-api; +Cc: aarcange, Gerd Hoffmann

Factor out some code from compat_sys_readv() which can be shared with the
upcoming compat_sys_preadv().

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 fs/compat.c |   24 ++++++++++++++++--------
 1 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/fs/compat.c b/fs/compat.c
index 65a070e..63738cc 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1167,16 +1167,11 @@ out:
 	return ret;
 }
 
-asmlinkage ssize_t
-compat_sys_readv(unsigned long fd, const struct compat_iovec __user *vec, unsigned long vlen)
+static size_t compat_readv(struct file *file, const struct compat_iovec __user *vec,
+                           unsigned long vlen, loff_t *pos)
 {
-	struct file *file;
 	ssize_t ret = -EBADF;
 
-	file = fget(fd);
-	if (!file)
-		return -EBADF;
-
 	if (!(file->f_mode & FMODE_READ))
 		goto out;
 
@@ -1184,12 +1179,25 @@ compat_sys_readv(unsigned long fd, const struct compat_iovec __user *vec, unsign
 	if (!file->f_op || (!file->f_op->aio_read && !file->f_op->read))
 		goto out;
 
-	ret = compat_do_readv_writev(READ, file, vec, vlen, &file->f_pos);
+	ret = compat_do_readv_writev(READ, file, vec, vlen, pos);
 
 out:
 	if (ret > 0)
 		add_rchar(current, ret);
 	inc_syscr(current);
+	return ret;
+}
+
+asmlinkage ssize_t
+compat_sys_readv(unsigned long fd, const struct compat_iovec __user *vec, unsigned long vlen)
+{
+	struct file *file;
+	ssize_t ret;
+
+	file = fget(fd);
+	if (!file)
+		return -EBADF;
+	ret = compat_readv(file, vec, vlen, &file->f_pos);
 	fput(file);
 	return ret;
 }
-- 
1.6.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] create compat_writev()
  2009-01-16 16:45 [PATCH v6 0/5] Add preadv & pwritev system calls Gerd Hoffmann
  2009-01-16 16:45 ` [PATCH 1/5] create compat_readv() Gerd Hoffmann
@ 2009-01-16 16:45 ` Gerd Hoffmann
  2009-01-16 17:28   ` Arnd Bergmann
  2009-01-16 16:45 ` [PATCH 3/5] Add preadv and pwritev system calls Gerd Hoffmann
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Gerd Hoffmann @ 2009-01-16 16:45 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-api; +Cc: aarcange, Gerd Hoffmann

Factor out some code from compat_sys_writev() which can be shared with the
upcoming compat_sys_pwritev().

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 fs/compat.c |   25 +++++++++++++++++--------
 1 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/fs/compat.c b/fs/compat.c
index 63738cc..9f20d10 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1202,15 +1202,11 @@ compat_sys_readv(unsigned long fd, const struct compat_iovec __user *vec, unsign
 	return ret;
 }
 
-asmlinkage ssize_t
-compat_sys_writev(unsigned long fd, const struct compat_iovec __user *vec, unsigned long vlen)
+static size_t compat_writev(struct file *file, const struct compat_iovec __user *vec,
+                            unsigned long vlen, loff_t *pos)
 {
-	struct file *file;
 	ssize_t ret = -EBADF;
 
-	file = fget(fd);
-	if (!file)
-		return -EBADF;
 	if (!(file->f_mode & FMODE_WRITE))
 		goto out;
 
@@ -1218,16 +1214,29 @@ compat_sys_writev(unsigned long fd, const struct compat_iovec __user *vec, unsig
 	if (!file->f_op || (!file->f_op->aio_write && !file->f_op->write))
 		goto out;
 
-	ret = compat_do_readv_writev(WRITE, file, vec, vlen, &file->f_pos);
+	ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos);
 
 out:
 	if (ret > 0)
 		add_wchar(current, ret);
 	inc_syscw(current);
-	fput(file);
 	return ret;
 }
 
+asmlinkage ssize_t
+compat_sys_writev(unsigned long fd, const struct compat_iovec __user *vec, unsigned long vlen)
+{
+	struct file *file;
+	ssize_t ret;
+
+	file = fget(fd);
+	if (!file)
+		return -EBADF;
+	ret = compat_writev(file, vec, vlen, &file->f_pos);
+        fput(file);
+        return ret;
+}
+
 asmlinkage long
 compat_sys_vmsplice(int fd, const struct compat_iovec __user *iov32,
 		    unsigned int nr_segs, unsigned int flags)
-- 
1.6.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] Add preadv and pwritev system calls.
  2009-01-16 16:45 [PATCH v6 0/5] Add preadv & pwritev system calls Gerd Hoffmann
  2009-01-16 16:45 ` [PATCH 1/5] create compat_readv() Gerd Hoffmann
  2009-01-16 16:45 ` [PATCH 2/5] create compat_writev() Gerd Hoffmann
@ 2009-01-16 16:45 ` Gerd Hoffmann
  2009-01-16 17:34   ` Arnd Bergmann
  2009-01-16 16:45 ` [PATCH 4/5] MIPS: Add preadv(2) and pwritev(2) syscalls Gerd Hoffmann
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Gerd Hoffmann @ 2009-01-16 16:45 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-api; +Cc: aarcange, Gerd Hoffmann

This patch adds preadv and pwritev system calls.  These syscalls are a
pretty straightforward combination of pread and readv (same for write).
They are quite useful for doing vectored I/O in threaded applications.
Using lseek+readv instead opens race windows you'll have to plug with
locking.

Other systems have such system calls too, for example NetBSD, check
here: http://www.daemon-systems.org/man/preadv.2.html

The application-visible interface provided by glibc should look like
this to be compatible to the existing implementations in the *BSD family:

  ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset);
  ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset);

This prototype has one problem though:  On 32bit archs is the (64bit)
offset argument unaligned, which the syscall ABI of several archs
doesn't allow to do.  At least s390 needs a wrapper in glibc to handle
this.  As we'll need a wrappers in glibc anyway I've decided to push
problem to glibc entriely and use a syscall prototype which works
without arch-specific wrappers inside the kernel:  The offset argument
is explicitly splitted into two 32bit values.

The patch sports the actual system call implementation and the windup in
the x86 system call tables.  Other archs follow as separate patches.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 arch/x86/ia32/ia32entry.S          |    2 +
 arch/x86/include/asm/unistd_32.h   |    2 +
 arch/x86/include/asm/unistd_64.h   |    4 +++
 arch/x86/kernel/syscall_table_32.S |    2 +
 fs/compat.c                        |   36 ++++++++++++++++++++++++++
 fs/read_write.c                    |   50 ++++++++++++++++++++++++++++++++++++
 include/linux/compat.h             |    6 ++++
 include/linux/syscalls.h           |    4 +++
 8 files changed, 106 insertions(+), 0 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 256b00b..9a8501b 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -826,4 +826,6 @@ ia32_sys_call_table:
 	.quad sys_dup3			/* 330 */
 	.quad sys_pipe2
 	.quad sys_inotify_init1
+	.quad compat_sys_preadv
+	.quad compat_sys_pwritev
 ia32_syscall_end:
diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h
index f2bba78..6e72d74 100644
--- a/arch/x86/include/asm/unistd_32.h
+++ b/arch/x86/include/asm/unistd_32.h
@@ -338,6 +338,8 @@
 #define __NR_dup3		330
 #define __NR_pipe2		331
 #define __NR_inotify_init1	332
+#define __NR_preadv		333
+#define __NR_pwritev		334
 
 #ifdef __KERNEL__
 
diff --git a/arch/x86/include/asm/unistd_64.h b/arch/x86/include/asm/unistd_64.h
index d2e415e..f818294 100644
--- a/arch/x86/include/asm/unistd_64.h
+++ b/arch/x86/include/asm/unistd_64.h
@@ -653,6 +653,10 @@ __SYSCALL(__NR_dup3, sys_dup3)
 __SYSCALL(__NR_pipe2, sys_pipe2)
 #define __NR_inotify_init1			294
 __SYSCALL(__NR_inotify_init1, sys_inotify_init1)
+#define __NR_preadv				295
+__SYSCALL(__NR_preadv, sys_preadv)
+#define __NR_pwritev				296
+__SYSCALL(__NR_pwritev, sys_pwritev)
 
 
 #ifndef __NO_STUBS
diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
index e2e86a0..106204c 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -332,3 +332,5 @@ ENTRY(sys_call_table)
 	.long sys_dup3			/* 330 */
 	.long sys_pipe2
 	.long sys_inotify_init1
+	.long sys_preadv
+	.long sys_pwritev
diff --git a/fs/compat.c b/fs/compat.c
index 9f20d10..20fec0e 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1202,6 +1202,24 @@ compat_sys_readv(unsigned long fd, const struct compat_iovec __user *vec, unsign
 	return ret;
 }
 
+asmlinkage ssize_t
+compat_sys_preadv(unsigned long fd, const struct compat_iovec __user *vec,
+                  unsigned long vlen, u32 pos_high, u32 pos_low)
+{
+	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+	struct file *file;
+	ssize_t ret;
+
+	if (pos < 0)
+		return -EINVAL;
+	file = fget(fd);
+	if (!file)
+		return -EBADF;
+	ret = compat_readv(file, vec, vlen, &pos);
+	fput(file);
+	return ret;
+}
+
 static size_t compat_writev(struct file *file, const struct compat_iovec __user *vec,
                             unsigned long vlen, loff_t *pos)
 {
@@ -1237,6 +1255,24 @@ compat_sys_writev(unsigned long fd, const struct compat_iovec __user *vec, unsig
         return ret;
 }
 
+asmlinkage ssize_t
+compat_sys_pwritev(unsigned long fd, const struct compat_iovec __user *vec,
+                   unsigned long vlen, u32 pos_high, u32 pos_low)
+{
+	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+	struct file *file;
+	ssize_t ret;
+
+	if (pos < 0)
+		return -EINVAL;
+	file = fget(fd);
+	if (!file)
+		return -EBADF;
+	ret = compat_writev(file, vec, vlen, &pos);
+	fput(file);
+	return ret;
+}
+
 asmlinkage long
 compat_sys_vmsplice(int fd, const struct compat_iovec __user *iov32,
 		    unsigned int nr_segs, unsigned int flags)
diff --git a/fs/read_write.c b/fs/read_write.c
index 400fe81..39de95c 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -731,6 +731,56 @@ SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
 	return ret;
 }
 
+SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
+                unsigned long, vlen, u32, pos_high, u32, pos_low)
+{
+	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+	struct file *file;
+	ssize_t ret = -EBADF;
+	int fput_needed;
+
+	if (pos < 0)
+		return -EINVAL;
+
+	file = fget_light(fd, &fput_needed);
+	if (file) {
+		ret = -ESPIPE;
+		if (file->f_mode & FMODE_PREAD)
+			ret = vfs_readv(file, vec, vlen, &pos);
+		fput_light(file, fput_needed);
+	}
+
+	if (ret > 0)
+		add_rchar(current, ret);
+	inc_syscr(current);
+	return ret;
+}
+
+SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
+                unsigned long, vlen, u32, pos_high, u32, pos_low)
+{
+	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+	struct file *file;
+	ssize_t ret = -EBADF;
+	int fput_needed;
+
+	if (pos < 0)
+		return -EINVAL;
+
+	file = fget_light(fd, &fput_needed);
+	if (file) {
+		ret = -ESPIPE;
+		if (file->f_mode & FMODE_PWRITE)
+			ret = vfs_writev(file, vec, vlen, &pos);
+		fput_light(file, fput_needed);
+	}
+
+	if (ret > 0)
+		add_wchar(current, ret);
+	inc_syscw(current);
+	return ret;
+}
+
 static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 			   size_t count, loff_t max)
 {
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 3fd2194..79dba49 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -183,6 +183,12 @@ asmlinkage ssize_t compat_sys_readv(unsigned long fd,
 		const struct compat_iovec __user *vec, unsigned long vlen);
 asmlinkage ssize_t compat_sys_writev(unsigned long fd,
 		const struct compat_iovec __user *vec, unsigned long vlen);
+asmlinkage ssize_t compat_sys_preadv(unsigned long fd,
+		const struct compat_iovec __user *vec,
+		unsigned long vlen, u32 pos_high, u32 pos_low);
+asmlinkage ssize_t compat_sys_pwritev(unsigned long fd,
+		const struct compat_iovec __user *vec,
+		unsigned long vlen, u32 pos_high, u32 pos_low);
 
 int compat_do_execve(char * filename, compat_uptr_t __user *argv,
 	        compat_uptr_t __user *envp, struct pt_regs * regs);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 16875f8..333377e 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -456,6 +456,10 @@ asmlinkage long sys_pread64(unsigned int fd, char __user *buf,
 			    size_t count, loff_t pos);
 asmlinkage long sys_pwrite64(unsigned int fd, const char __user *buf,
 			     size_t count, loff_t pos);
+asmlinkage long sys_preadv(unsigned long fd, const struct iovec __user *vec,
+                           unsigned long vlen, u32 pos_high, u32 pos_low);
+asmlinkage long sys_pwritev(unsigned long fd, const struct iovec __user *vec,
+                            unsigned long vlen, u32 pos_high, u32 pos_low);
 asmlinkage long sys_getcwd(char __user *buf, unsigned long size);
 asmlinkage long sys_mkdir(const char __user *pathname, int mode);
 asmlinkage long sys_chdir(const char __user *filename);
-- 
1.6.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] MIPS: Add preadv(2) and pwritev(2) syscalls.
  2009-01-16 16:45 [PATCH v6 0/5] Add preadv & pwritev system calls Gerd Hoffmann
                   ` (2 preceding siblings ...)
  2009-01-16 16:45 ` [PATCH 3/5] Add preadv and pwritev system calls Gerd Hoffmann
@ 2009-01-16 16:45 ` Gerd Hoffmann
  2009-01-16 16:45 ` [PATCH 5/5] switch compat readv/preadv/writev/pwritev from fget to fget_light Gerd Hoffmann
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Gerd Hoffmann @ 2009-01-16 16:45 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-api; +Cc: aarcange, Ralf Baechle, Gerd Hoffmann

From: Ralf Baechle <ralf@linux-mips.org>

From: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 arch/mips/include/asm/unistd.h |   18 ++++++++++++------
 arch/mips/kernel/scall32-o32.S |    2 ++
 arch/mips/kernel/scall64-64.S  |    2 ++
 arch/mips/kernel/scall64-n32.S |    2 ++
 arch/mips/kernel/scall64-o32.S |    2 ++
 5 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/unistd.h b/arch/mips/include/asm/unistd.h
index a73e153..4000501 100644
--- a/arch/mips/include/asm/unistd.h
+++ b/arch/mips/include/asm/unistd.h
@@ -350,16 +350,18 @@
 #define __NR_dup3			(__NR_Linux + 327)
 #define __NR_pipe2			(__NR_Linux + 328)
 #define __NR_inotify_init1		(__NR_Linux + 329)
+#define __NR_preadv			(__NR_Linux + 330)
+#define __NR_pwritev			(__NR_Linux + 331)
 
 /*
  * Offset of the last Linux o32 flavoured syscall
  */
-#define __NR_Linux_syscalls		329
+#define __NR_Linux_syscalls		331
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */
 
 #define __NR_O32_Linux			4000
-#define __NR_O32_Linux_syscalls		329
+#define __NR_O32_Linux_syscalls		331
 
 #if _MIPS_SIM == _MIPS_SIM_ABI64
 
@@ -656,16 +658,18 @@
 #define __NR_dup3			(__NR_Linux + 286)
 #define __NR_pipe2			(__NR_Linux + 287)
 #define __NR_inotify_init1		(__NR_Linux + 288)
+#define __NR_preadv			(__NR_Linux + 289)
+#define __NR_pwritev			(__NR_Linux + 290)
 
 /*
  * Offset of the last Linux 64-bit flavoured syscall
  */
-#define __NR_Linux_syscalls		288
+#define __NR_Linux_syscalls		290
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI64 */
 
 #define __NR_64_Linux			5000
-#define __NR_64_Linux_syscalls		288
+#define __NR_64_Linux_syscalls		290
 
 #if _MIPS_SIM == _MIPS_SIM_NABI32
 
@@ -966,16 +970,18 @@
 #define __NR_dup3			(__NR_Linux + 290)
 #define __NR_pipe2			(__NR_Linux + 291)
 #define __NR_inotify_init1		(__NR_Linux + 292)
+#define __NR_preadv			(__NR_Linux + 293)
+#define __NR_pwritev			(__NR_Linux + 294)
 
 /*
  * Offset of the last N32 flavoured syscall
  */
-#define __NR_Linux_syscalls		292
+#define __NR_Linux_syscalls		294
 
 #endif /* _MIPS_SIM == _MIPS_SIM_NABI32 */
 
 #define __NR_N32_Linux			6000
-#define __NR_N32_Linux_syscalls		292
+#define __NR_N32_Linux_syscalls		294
 
 #ifdef __KERNEL__
 
diff --git a/arch/mips/kernel/scall32-o32.S b/arch/mips/kernel/scall32-o32.S
index 51d1ba4..0198a9c 100644
--- a/arch/mips/kernel/scall32-o32.S
+++ b/arch/mips/kernel/scall32-o32.S
@@ -650,6 +650,8 @@ einval:	li	v0, -ENOSYS
 	sys	sys_dup3		3
 	sys	sys_pipe2		2
 	sys	sys_inotify_init1	1
+	sys	sys_preadv		6	/* 4330 */
+	sys	sys_pwritev		6
 	.endm
 
 	/* We pre-compute the number of _instruction_ bytes needed to
diff --git a/arch/mips/kernel/scall64-64.S b/arch/mips/kernel/scall64-64.S
index a9e1716..217e3ce 100644
--- a/arch/mips/kernel/scall64-64.S
+++ b/arch/mips/kernel/scall64-64.S
@@ -487,4 +487,6 @@ sys_call_table:
 	PTR	sys_dup3
 	PTR	sys_pipe2
 	PTR	sys_inotify_init1
+	PTR	sys_preadv
+	PTR	sys_pwritev			/* 5390 */
 	.size	sys_call_table,.-sys_call_table
diff --git a/arch/mips/kernel/scall64-n32.S b/arch/mips/kernel/scall64-n32.S
index 30f3b63..f340963 100644
--- a/arch/mips/kernel/scall64-n32.S
+++ b/arch/mips/kernel/scall64-n32.S
@@ -413,4 +413,6 @@ EXPORT(sysn32_call_table)
 	PTR	sys_dup3			/* 5290 */
 	PTR	sys_pipe2
 	PTR	sys_inotify_init1
+	PTR	sys_preadv
+	PTR	sys_pwritev
 	.size	sysn32_call_table,.-sysn32_call_table
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
index fefef4a..b1d281a 100644
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -533,4 +533,6 @@ sys_call_table:
 	PTR	sys_dup3
 	PTR	sys_pipe2
 	PTR	sys_inotify_init1
+	PTR	compat_sys_preadv		/* 4330 */
+	PTR	compat_sys_pwritev
 	.size	sys_call_table,.-sys_call_table
-- 
1.6.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] switch compat readv/preadv/writev/pwritev from fget to fget_light
  2009-01-16 16:45 [PATCH v6 0/5] Add preadv & pwritev system calls Gerd Hoffmann
                   ` (3 preceding siblings ...)
  2009-01-16 16:45 ` [PATCH 4/5] MIPS: Add preadv(2) and pwritev(2) syscalls Gerd Hoffmann
@ 2009-01-16 16:45 ` Gerd Hoffmann
  2009-01-16 16:53 ` [PATCH v6 0/5] Add preadv & pwritev system calls Michael Kerrisk
  2009-01-16 17:52 ` Arnd Bergmann
  6 siblings, 0 replies; 14+ messages in thread
From: Gerd Hoffmann @ 2009-01-16 16:45 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-api; +Cc: aarcange, Gerd Hoffmann


Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 fs/compat.c |   20 ++++++++++++--------
 1 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/fs/compat.c b/fs/compat.c
index 20fec0e..90a10e9 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1192,13 +1192,14 @@ asmlinkage ssize_t
 compat_sys_readv(unsigned long fd, const struct compat_iovec __user *vec, unsigned long vlen)
 {
 	struct file *file;
+	int fput_needed;
 	ssize_t ret;
 
-	file = fget(fd);
+	file = fget_light(fd, &fput_needed);
 	if (!file)
 		return -EBADF;
 	ret = compat_readv(file, vec, vlen, &file->f_pos);
-	fput(file);
+	fput_light(file, fput_needed);
 	return ret;
 }
 
@@ -1208,15 +1209,16 @@ compat_sys_preadv(unsigned long fd, const struct compat_iovec __user *vec,
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 	struct file *file;
+	int fput_needed;
 	ssize_t ret;
 
 	if (pos < 0)
 		return -EINVAL;
-	file = fget(fd);
+	file = fget_light(fd, &fput_needed);
 	if (!file)
 		return -EBADF;
 	ret = compat_readv(file, vec, vlen, &pos);
-	fput(file);
+	fput_light(file, fput_needed);
 	return ret;
 }
 
@@ -1245,13 +1247,14 @@ asmlinkage ssize_t
 compat_sys_writev(unsigned long fd, const struct compat_iovec __user *vec, unsigned long vlen)
 {
 	struct file *file;
+	int fput_needed;
 	ssize_t ret;
 
-	file = fget(fd);
+	file = fget_light(fd, &fput_needed);
 	if (!file)
 		return -EBADF;
 	ret = compat_writev(file, vec, vlen, &file->f_pos);
-        fput(file);
+        fput_light(file, fput_needed);
         return ret;
 }
 
@@ -1261,15 +1264,16 @@ compat_sys_pwritev(unsigned long fd, const struct compat_iovec __user *vec,
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 	struct file *file;
+	int fput_needed;
 	ssize_t ret;
 
 	if (pos < 0)
 		return -EINVAL;
-	file = fget(fd);
+	file = fget_light(fd, &fput_needed);
 	if (!file)
 		return -EBADF;
 	ret = compat_writev(file, vec, vlen, &pos);
-	fput(file);
+	fput_light(file, fput_needed);
 	return ret;
 }
 
-- 
1.6.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 0/5] Add preadv & pwritev system calls.
  2009-01-16 16:45 [PATCH v6 0/5] Add preadv & pwritev system calls Gerd Hoffmann
                   ` (4 preceding siblings ...)
  2009-01-16 16:45 ` [PATCH 5/5] switch compat readv/preadv/writev/pwritev from fget to fget_light Gerd Hoffmann
@ 2009-01-16 16:53 ` Michael Kerrisk
  2009-01-16 17:52 ` Arnd Bergmann
  6 siblings, 0 replies; 14+ messages in thread
From: Michael Kerrisk @ 2009-01-16 16:53 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: linux-kernel, linux-arch, linux-api, aarcange

Gerd,

On Sat, Jan 17, 2009 at 5:45 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
>  Hi folks,
>
> Next round of the preadv & pwritev patch series.

Have you somewhere along the way posted some userland example/test
programs for these syscalls?  If not, could you provide some?

Cheers,

Michael

> Had no review comments
> to fix.  That means it is ready to be merged, right?
>
> Changes:
>  - compat_sys_{read,write}v bugfix patch dropped (merged).
>  - rebase to latest git, adapt to CVE-2009-0029 changes.
>
> How to proceed now?  Is there a syscall maintainer where I could queue
> up the patches?  If not, anyone (akpm?) willng to pick this up?  Should
> I try to send to Linus directly?
>
> What is the usual way to handle the arch-specific syscall windup?  I'd
> prefer to leave that to the arch maintainers as they know best what
> needs to be done, is that ok?  Right now only x86 (/me) and mips (patch
> from Ralf Baechle) is covered ...
>
> cheers,
>  Gerd
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] create compat_writev()
  2009-01-16 16:45 ` [PATCH 2/5] create compat_writev() Gerd Hoffmann
@ 2009-01-16 17:28   ` Arnd Bergmann
  0 siblings, 0 replies; 14+ messages in thread
From: Arnd Bergmann @ 2009-01-16 17:28 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: linux-kernel, linux-arch, linux-api, aarcange

On Friday 16 January 2009, Gerd Hoffmann wrote:
> +asmlinkage ssize_t
> +compat_sys_writev(unsigned long fd, const struct compat_iovec __user *vec, unsigned long vlen)
> +{
> +       struct file *file;
> +       ssize_t ret;
> +
> +       file = fget(fd);
> +       if (!file)
> +               return -EBADF;
> +       ret = compat_writev(file, vec, vlen, &file->f_pos);
> +        fput(file);
> +        return ret;
> +}

This one still looks whitespace damaged, did you run the latest version
through checkpatch.pl?

	Arnd <><

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] Add preadv and pwritev system calls.
  2009-01-16 16:45 ` [PATCH 3/5] Add preadv and pwritev system calls Gerd Hoffmann
@ 2009-01-16 17:34   ` Arnd Bergmann
  0 siblings, 0 replies; 14+ messages in thread
From: Arnd Bergmann @ 2009-01-16 17:34 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: linux-kernel, linux-arch, linux-api, aarcange

On Friday 16 January 2009, Gerd Hoffmann wrote:
> +asmlinkage ssize_t compat_sys_preadv(unsigned long fd,
> +               const struct compat_iovec __user *vec,
> +               unsigned long vlen, u32 pos_high, u32 pos_low);
> +asmlinkage ssize_t compat_sys_pwritev(unsigned long fd,
> +               const struct compat_iovec __user *vec,
> +               unsigned long vlen, u32 pos_high, u32 pos_low);
>  
>  int compat_do_execve(char * filename, compat_uptr_t __user *argv,
>                 compat_uptr_t __user *envp, struct pt_regs * regs);
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 16875f8..333377e 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -456,6 +456,10 @@ asmlinkage long sys_pread64(unsigned int fd, char __user *buf,
>                             size_t count, loff_t pos);
>  asmlinkage long sys_pwrite64(unsigned int fd, const char __user *buf,
>                              size_t count, loff_t pos);
> +asmlinkage long sys_preadv(unsigned long fd, const struct iovec __user *vec,
> +                           unsigned long vlen, u32 pos_high, u32 pos_low);
> +asmlinkage long sys_pwritev(unsigned long fd, const struct iovec __user *vec,
> +                            unsigned long vlen, u32 pos_high, u32 pos_low);

Conventionally, the 'fd' argument has type 'int', not 'unsigned long', but you
evidently copied this from readv/writev, so you can't really be blamed for it.
Not sure what the right thing to do here is.

	Arnd <><

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 0/5] Add preadv & pwritev system calls.
  2009-01-16 16:45 [PATCH v6 0/5] Add preadv & pwritev system calls Gerd Hoffmann
                   ` (5 preceding siblings ...)
  2009-01-16 16:53 ` [PATCH v6 0/5] Add preadv & pwritev system calls Michael Kerrisk
@ 2009-01-16 17:52 ` Arnd Bergmann
  2009-01-16 19:20   ` Ulrich Drepper
  6 siblings, 1 reply; 14+ messages in thread
From: Arnd Bergmann @ 2009-01-16 17:52 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: linux-kernel, linux-arch, linux-api, aarcange, Ulrich Drepper

On Friday 16 January 2009, Gerd Hoffmann wrote:
> Next round of the preadv & pwritev patch series.  Had no review comments
> to fix.  That means it is ready to be merged, right?

In general, getting no feedback does not mean something's ready ;-)

Your patches look pretty polished though, and IIRC the only
real objection was to whether or not the syscalls should be
there in the first place. Did you get any feedback from Ulrich
Drepper as to whether he plans to add support to glibc?

> Changes:
>  - compat_sys_{read,write}v bugfix patch dropped (merged).
>  - rebase to latest git, adapt to CVE-2009-0029 changes.
> 
> How to proceed now?  Is there a syscall maintainer where I could queue
> up the patches?  If not, anyone (akpm?) willng to pick this up?  Should
> I try to send to Linus directly?
> 
> What is the usual way to handle the arch-specific syscall windup?  I'd
> prefer to leave that to the arch maintainers as they know best what
> needs to be done, is that ok?  Right now only x86 (/me) and mips (patch
> from Ralf Baechle) is covered ...

I'd say get it into linux-next as a git tree, then let the arch maintainers
send you the missing patches to hook up the syscalls so that it can go
in as one chunk.

Have you done the glibc patch already? You probably also need to provide
an alternative user space implementation based on a pread/pwrite loop for
older kernels.

	Arnd <><

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 0/5] Add preadv & pwritev system calls.
  2009-01-16 17:52 ` Arnd Bergmann
@ 2009-01-16 19:20   ` Ulrich Drepper
  2009-01-19 14:20     ` Gerd Hoffmann
  0 siblings, 1 reply; 14+ messages in thread
From: Ulrich Drepper @ 2009-01-16 19:20 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Gerd Hoffmann, linux-kernel, linux-arch, linux-api, aarcange

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Arnd Bergmann wrote:
> Did you get any feedback from Ulrich
> Drepper as to whether he plans to add support to glibc?

If they are in the kernel there is no reason not to export them from
glibc.  But I have a general comment about all kinds of read syscalls.
If think they have been misdesigned from day one and if we are going to
add new ones we might want to fix them.

The problem is that they don't allow for zero-copy operations in enough
cases.  The kernel is not free to store the data wherever it wants even
if the userlevel code is fine with that.  Ideally the program would tell
the kernel that it is fine with any addressable address and provides a
buffer for the kernel to use in case zero-copy into that buffer is
possible or no zero-copy is possible at all.  An interface could look
like this:

   ssize_t readz (int fd, void *buf, size_t len, void **res)

(and accordingly for similar calls).  The application will then use the
pointer stored at the address pointed to by the fourth parameter instead
of unconditionally using the buffer pointed to by the second parameter.
 For res==NULL the semantics could be the same as the normal read().

This is not the only interface needed to make this work.  Somehow the
memory used for the zero-copy buffers has to be administrated.  At the
very least an interface to mark the buffer returned by readz() as unused
is needed.

There is a lot to think about before this can be done (something I
started back in my 2006 OLS paper [1]).  But I wonder whether it's worth
preparing for it and not add yet more interfaces which aren't ready for
this type of I/O.


[1] http://people.redhat.com/drepper/newni.pdf

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAklw3bIACgkQ2ijCOnn/RHSutgCgvIZki4gZfuLzwCOGkZqOf97v
1LYAn3fQj0C8CabsfvaYonFTZQ3oUtSn
=EDYF
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 0/5] Add preadv & pwritev system calls.
  2009-01-16 19:20   ` Ulrich Drepper
@ 2009-01-19 14:20     ` Gerd Hoffmann
  2009-01-21  0:11       ` Petr Baudis
  0 siblings, 1 reply; 14+ messages in thread
From: Gerd Hoffmann @ 2009-01-19 14:20 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Arnd Bergmann, linux-kernel, linux-arch, linux-api, aarcange

Ulrich Drepper wrote:
> If they are in the kernel there is no reason not to export them from
> glibc.

Great.

> But I have a general comment about all kinds of read syscalls.
> If think they have been misdesigned from day one and if we are going to
> add new ones we might want to fix them.
> 
> The problem is that they don't allow for zero-copy operations in enough
> cases.  The kernel is not free to store the data wherever it wants even
> if the userlevel code is fine with that.

[ ... more text snipped ... ]

I do see the point in adding a interface like this ...

>    ssize_t readz (int fd, void *buf, size_t len, void **res)

... to help the kernel do zero-copy I/O.

I think system calls for vector I/O are *not* the right place for that
though.  Usually applications use vectored I/O because they *do* care
about the place the data is stored, because vectored I/O allows them to
avoid copying data within the application.

cheers,
  Gerd



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 0/5] Add preadv & pwritev system calls.
  2009-01-19 14:20     ` Gerd Hoffmann
@ 2009-01-21  0:11       ` Petr Baudis
  2009-01-21  9:31         ` Gerd Hoffmann
  0 siblings, 1 reply; 14+ messages in thread
From: Petr Baudis @ 2009-01-21  0:11 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Ulrich Drepper, Arnd Bergmann, linux-kernel, linux-arch,
	linux-api, aarcange

On Mon, Jan 19, 2009 at 03:20:11PM +0100, Gerd Hoffmann wrote:
> I do see the point in adding a interface like this ...
> 
> >    ssize_t readz (int fd, void *buf, size_t len, void **res)
> 
> ... to help the kernel do zero-copy I/O.
> 
> I think system calls for vector I/O are *not* the right place for that
> though.  Usually applications use vectored I/O because they *do* care
> about the place the data is stored, because vectored I/O allows them to
> avoid copying data within the application.

Can you elaborate on this? An application would have to have quite a
contrived design if its pointers simply cannot be updated according
to what the kernel returns.

Then again, I'm not sure why wouldn't readv() actually be
zerocopy-ready. Just make sure you handle iov_base being NULL gracefully
now (EINVAL, with the remark that the kernel can write to the iovec
memory area in the future) and later the kernel can in that case set
iov_base to the buffer location?

-- 
				Petr "Pasky" Baudis
The average, healthy, well-adjusted adult gets up at seven-thirty
in the morning feeling just terrible. -- Jean Kerr

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 0/5] Add preadv & pwritev system calls.
  2009-01-21  0:11       ` Petr Baudis
@ 2009-01-21  9:31         ` Gerd Hoffmann
  0 siblings, 0 replies; 14+ messages in thread
From: Gerd Hoffmann @ 2009-01-21  9:31 UTC (permalink / raw)
  To: Petr Baudis
  Cc: Ulrich Drepper, Arnd Bergmann, linux-kernel, linux-arch,
	linux-api, aarcange

Petr Baudis wrote:
> On Mon, Jan 19, 2009 at 03:20:11PM +0100, Gerd Hoffmann wrote:
>> I do see the point in adding a interface like this ...
>>
>>>    ssize_t readz (int fd, void *buf, size_t len, void **res)
>> ... to help the kernel do zero-copy I/O.
>>
>> I think system calls for vector I/O are *not* the right place for that
>> though.  Usually applications use vectored I/O because they *do* care
>> about the place the data is stored, because vectored I/O allows them to
>> avoid copying data within the application.
> 
> Can you elaborate on this? An application would have to have quite a
> contrived design if its pointers simply cannot be updated according
> to what the kernel returns.

Well.  The "just update the pointers" argument is bogous.  It simply
doesn't work in general for a number of reasons:

First, it assumes that there is a pointer you can update in the first place.

Second, it assumes you can easily update all pointer instances pointing
to your data.  Finding all places which need updating might be
non-trivial or impossible.  This tends to be true for refcounted data
structures.

Third, updating pointers can have extra costs, such as locking
requirements in threaded applications.

Of course there are also tons of applications where it is absolutely no
problem to just update the pointer.

But I think that applications using the vectored I/O very likely belong
to the group which can't.  Otherwise there would be little reason to use
the vectored API in the first place.

> Then again, I'm not sure why wouldn't readv() actually be
> zerocopy-ready. Just make sure you handle iov_base being NULL gracefully
> now (EINVAL, with the remark that the kernel can write to the iovec
> memory area in the future) and later the kernel can in that case set
> iov_base to the buffer location?

You could (assuming you don't break POSIX along the way).  I don't think
apps would use such an interface though for the reasons outlined above.
 The readz() prototype by Ullrich (without iovecs being involved) looks
much more reasonable to me.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-01-21  9:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-16 16:45 [PATCH v6 0/5] Add preadv & pwritev system calls Gerd Hoffmann
2009-01-16 16:45 ` [PATCH 1/5] create compat_readv() Gerd Hoffmann
2009-01-16 16:45 ` [PATCH 2/5] create compat_writev() Gerd Hoffmann
2009-01-16 17:28   ` Arnd Bergmann
2009-01-16 16:45 ` [PATCH 3/5] Add preadv and pwritev system calls Gerd Hoffmann
2009-01-16 17:34   ` Arnd Bergmann
2009-01-16 16:45 ` [PATCH 4/5] MIPS: Add preadv(2) and pwritev(2) syscalls Gerd Hoffmann
2009-01-16 16:45 ` [PATCH 5/5] switch compat readv/preadv/writev/pwritev from fget to fget_light Gerd Hoffmann
2009-01-16 16:53 ` [PATCH v6 0/5] Add preadv & pwritev system calls Michael Kerrisk
2009-01-16 17:52 ` Arnd Bergmann
2009-01-16 19:20   ` Ulrich Drepper
2009-01-19 14:20     ` Gerd Hoffmann
2009-01-21  0:11       ` Petr Baudis
2009-01-21  9:31         ` Gerd Hoffmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).