All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexey Dobriyan <adobriyan@gmail.com>
To: akpm@linux-foundation.org, viro@zeniv.linux.org.uk
Cc: torvalds@linux-foundation.org, drepper@gmail.com,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: [PATCH] nextfd(2)
Date: Sun, 1 Apr 2012 15:57:42 +0300	[thread overview]
Message-ID: <20120401125741.GA7484@p183.telecom.by> (raw)

Currently there is no reliable way to close all opened file descriptors
(which daemons need and like to do):

* dumb close(fd) loop is slow, upper bound is unknown and
  can be arbitrary large,

* /proc/self/fd is unreliable:
  proc may be unconfigured or not mounted at expected place.
  Looking at /proc/self/fd requires opening directory
  which may not be available due to malicious rlimit drop or ENOMEM situations.
  Not opening directory is equivalent to dumb close(2) loop except slower.

BSD added closefrom(fd) which is OK for this exact purpose but suboptimal
on the bigger scale. closefrom(2) does only close(2) (obviously :-)
closefrom(2) siletly ignores errors from close(2) which in theory is not OK
for userspace.

So, don't add closefrom(2), add nextfd(2).

	int nextfd(int fd)

returns next opened file descriptor which is >= than fd or -1/ESRCH
if there aren't any descriptors >= than fd.

Thus closefrom(3) can be rewritten through it in userspace:

	void closefrom(int fd)
	{
		while (1) {
			fd = nextfd(fd);
			if (fd == -1 && errno == ESRCH)
				break;
			(void)close(fd);
			fd++;
		}
	}

Maybe it will grow other smart uses.

nextfd(2) doesn't change kernel state and thus can't fail
which is why it should go in. Other means may fail or
may not be available or require linear time with only guessed
upper boundaries (1024, getrlimit(RLIM_NOFILE), sysconf(_SC_OPEN_MAX).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

 arch/x86/syscalls/syscall_32.tbl |    1 +
 arch/x86/syscalls/syscall_64.tbl |    1 +
 fs/Makefile                      |    1 +
 fs/nextfd.c                      |   27 +++++++++++++++++++++++++++
 include/linux/syscalls.h         |    1 +
 5 files changed, 31 insertions(+)

--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -355,3 +355,4 @@
 346	i386	setns			sys_setns
 347	i386	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
 348	i386	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+349	i386	nextfd			sys_nextfd
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -318,6 +318,7 @@
 309	common	getcpu			sys_getcpu
 310	64	process_vm_readv	sys_process_vm_readv
 311	64	process_vm_writev	sys_process_vm_writev
+312	64	nextfd			sys_nextfd
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
 # for native 64-bit operation.
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -12,6 +12,7 @@ obj-y :=	open.o read_write.o file_table.o super.o \
 		seq_file.o xattr.o libfs.o fs-writeback.o \
 		pnode.o drop_caches.o splice.o sync.o utimes.o \
 		stack.o fs_struct.o statfs.o
+obj-y += nextfd.o
 
 ifeq ($(CONFIG_BLOCK),y)
 obj-y +=	buffer.o bio.o block_dev.o direct-io.o mpage.o ioprio.o
--- /dev/null
+++ b/fs/nextfd.c
@@ -0,0 +1,27 @@
+#include <linux/errno.h>
+#include <linux/fdtable.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/syscalls.h>
+
+/* Return first opened file descriptor which is >= than the argument. */
+SYSCALL_DEFINE1(nextfd, unsigned int, fd)
+{
+	struct files_struct *files = current->files;
+	struct fdtable *fdt;
+
+	rcu_read_lock();
+	fdt = files_fdtable(files);
+	while (fd < fdt->max_fds) {
+		struct file *file;
+
+		file = rcu_dereference_check_fdtable(files, fdt->fd[fd]);
+		if (file) {
+			rcu_read_unlock();
+			return fd;
+		}
+		fd++;
+	}
+	rcu_read_unlock();
+	return -ESRCH;
+}
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -857,5 +857,6 @@ asmlinkage long sys_process_vm_writev(pid_t pid,
 				      const struct iovec __user *rvec,
 				      unsigned long riovcnt,
 				      unsigned long flags);
+asmlinkage long sys_nextfd(unsigned int fd);
 
 #endif

             reply	other threads:[~2012-04-01 12:57 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-01 12:57 Alexey Dobriyan [this message]
2012-04-01 13:58 ` [PATCH] nextfd(2) Konstantin Khlebnikov
2012-04-01 21:30   ` Alexey Dobriyan
2012-04-02  0:09   ` Alan Cox
2012-04-02  8:38     ` Konstantin Khlebnikov
2012-04-02  9:26       ` Cyrill Gorcunov
2012-04-01 15:43 ` Eric Dumazet
2012-04-01 21:31   ` Alexey Dobriyan
2012-04-01 21:36   ` Alan Cox
2012-04-01 17:20 ` Linus Torvalds
2012-04-01 18:28 ` Valentin Nechayev
2012-04-01 21:33   ` Alexey Dobriyan
2012-04-01 19:21 ` Arnd Bergmann
2012-04-01 21:35   ` Alexey Dobriyan
2012-04-01 22:05   ` H. Peter Anvin
2012-04-04 12:13     ` Arnd Bergmann
2012-04-01 22:03 ` H. Peter Anvin
2012-04-01 22:13   ` H. Peter Anvin
2012-04-02  0:08   ` Alan Cox
2012-04-30  9:58     ` Valentin Nechayev
2012-04-02  1:19   ` Kyle Moffett
2012-04-02  1:19     ` Kyle Moffett
2012-04-02  1:37     ` H. Peter Anvin
2012-04-02 11:37     ` Ulrich Drepper
2012-04-06  9:54   ` Alexey Dobriyan
2012-04-06  9:54     ` Alexey Dobriyan
2012-04-06 15:27     ` Colin Walters
2012-04-06 16:14     ` H. Peter Anvin
2012-04-06 20:16       ` Alexey Dobriyan
2012-04-06 20:33         ` H. Peter Anvin
2012-04-06 21:02         ` H. Peter Anvin
2012-04-12 10:54           ` Alexey Dobriyan
2012-04-12 10:54             ` Alexey Dobriyan
2012-04-12 11:11             ` Alan Cox
2012-04-12 11:11               ` Alan Cox
2012-04-12 13:35               ` Alexey Dobriyan
2012-04-12 13:51                 ` H. Peter Anvin
2012-04-12 19:21                   ` Alexey Dobriyan
2012-04-12 14:09               ` Eric Dumazet
2012-04-06 16:23     ` H. Peter Anvin
2012-04-07 21:21       ` Ben Pfaff
2012-04-11  0:12         ` KOSAKI Motohiro
2012-04-11  0:12           ` KOSAKI Motohiro
2012-04-11  0:09       ` KOSAKI Motohiro
2012-04-11 17:58         ` H. Peter Anvin
2012-04-11 18:04           ` Linus Torvalds
2012-04-11 18:04             ` Linus Torvalds
2012-04-11 18:11             ` H. Peter Anvin
2012-04-11 19:46               ` KOSAKI Motohiro
2012-04-11 19:46                 ` KOSAKI Motohiro
2012-04-11 19:49                 ` H. Peter Anvin
2012-04-11 20:23                   ` KOSAKI Motohiro
2012-04-11 20:32                     ` H. Peter Anvin
2012-04-17 18:12                       ` KOSAKI Motohiro
2012-04-11 18:00         ` H. Peter Anvin
2012-04-11 19:20           ` KOSAKI Motohiro
2012-04-11 19:20             ` KOSAKI Motohiro
2012-04-11 19:22             ` H. Peter Anvin
2012-04-11 19:26               ` KOSAKI Motohiro
2012-04-11 19:28                 ` H. Peter Anvin
2012-04-11 19:31                   ` KOSAKI Motohiro
2012-04-11 19:32                     ` H. Peter Anvin
2012-04-02 23:17 ` KOSAKI Motohiro
2012-04-02 23:56   ` H. Peter Anvin
2012-04-04 11:51     ` Ulrich Drepper
2012-04-04 16:38       ` KOSAKI Motohiro
2012-04-04 16:43         ` Ulrich Drepper
2012-04-04 17:07           ` KOSAKI Motohiro
2012-04-04 17:49             ` Ulrich Drepper
2012-04-04 18:08               ` KOSAKI Motohiro
2012-04-04 16:31     ` KOSAKI Motohiro
2012-04-04 17:10       ` Colin Walters
2012-04-04 17:25         ` Colin Walters
2012-04-04 23:35         ` KOSAKI Motohiro
2012-04-04 18:44       ` H. Peter Anvin
2012-04-03 19:21   ` Colin Walters
2012-04-04  3:01 ` Al Viro
2012-04-04 17:10   ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120401125741.GA7484@p183.telecom.by \
    --to=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=drepper@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.