linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [rfc 0/4] procfs fdinfo extension
@ 2012-05-17 16:07 Cyrill Gorcunov
  2012-05-17 16:07 ` [rfc 1/4] procfs: Move /proc/pid/fd[info] handling code to fd.[ch] Cyrill Gorcunov
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Cyrill Gorcunov @ 2012-05-17 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Pavel Emelyanov, James Bottomley, linux-fsdevel

Hi guys,

when we do restore files such as eventfd/eventpoll we need to pass
appropriate parameters to system calls. Unfortunately there is no
easy way to retrieve this information from the kernel. So to make
possible to obtain this kind of information the proc/pid/fdinfo/fd
handling code has been modified to print "extra" snippets depending
on file type.

The base idea is to make fdinfo being seq-files and plug-in helpers.
For example eventpoll files together with basic pos/flags now print
out target file descriptor number, events and data associated with
an event:

 | pos:	0
 | flags:	02
 | tfd:        5 events:       1d data: ffffffffffffffff

This series is early draft not for inclusion yet but just to
gather people opinions on idea in general and implementation
details. I tried to make the change as small as I can.

I'm working on fsnotify part at moment but since it's not
yet complete decided to not include this patch, it'll appear
later.

(I've been said that idea of extending fdinfo to provide
 additional data snippets was approved on LSF this year.
 Am I right, Pavel?)

Any comments and complains are highly appreciated!

	Cyrill

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [rfc 1/4] procfs: Move /proc/pid/fd[info] handling code to fd.[ch]
  2012-05-17 16:07 [rfc 0/4] procfs fdinfo extension Cyrill Gorcunov
@ 2012-05-17 16:07 ` Cyrill Gorcunov
  2012-05-17 16:07 ` [rfc 2/4] procfs: Convert /proc/pid/fdinfo/fd handling routines into seq-file with fdinfo helpers Cyrill Gorcunov
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Cyrill Gorcunov @ 2012-05-17 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Pavel Emelyanov, James Bottomley, linux-fsdevel,
	Cyrill Gorcunov

[-- Attachment #1: seq-fdinfo-base-proc-2 --]
[-- Type: text/plain, Size: 23376 bytes --]

This prepares the ground for further extension of /proc/pid/fd[info]
handling code, also makes both fs/proc/base.c and fs/proc/fd.c easier
to read.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
 fs/proc/Makefile   |    2 
 fs/proc/base.c     |  401 -----------------------------------------------------
 fs/proc/fd.c       |  370 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/proc/fd.h       |   14 +
 fs/proc/internal.h |   49 ++++++
 5 files changed, 436 insertions(+), 400 deletions(-)

Index: linux-2.6.git/fs/proc/Makefile
===================================================================
--- linux-2.6.git.orig/fs/proc/Makefile
+++ linux-2.6.git/fs/proc/Makefile
@@ -8,7 +8,7 @@ proc-y			:= nommu.o task_nommu.o
 proc-$(CONFIG_MMU)	:= mmu.o task_mmu.o
 
 proc-y       += inode.o root.o base.o generic.o array.o \
-		proc_tty.o
+		proc_tty.o fd.o
 proc-y	+= cmdline.o
 proc-y	+= consoles.o
 proc-y	+= cpuinfo.o
Index: linux-2.6.git/fs/proc/base.c
===================================================================
--- linux-2.6.git.orig/fs/proc/base.c
+++ linux-2.6.git/fs/proc/base.c
@@ -88,7 +88,9 @@
 #include <asm/hardwall.h>
 #endif
 #include <trace/events/oom.h>
+
 #include "internal.h"
+#include "fd.h"
 
 /* NOTE:
  *	Implementing inode permission operations in /proc is almost
@@ -135,8 +137,6 @@ struct pid_entry {
 		NULL, &proc_single_file_operations,	\
 		{ .proc_show = show } )
 
-static int proc_fd_permission(struct inode *inode, int mask);
-
 /*
  * Count the number of hardlinks for the pid_entry table, excluding the .
  * and .. links.
@@ -1485,30 +1485,14 @@ out:
 	return error;
 }
 
-static const struct inode_operations proc_pid_link_inode_operations = {
+const struct inode_operations proc_pid_link_inode_operations = {
 	.readlink	= proc_pid_readlink,
 	.follow_link	= proc_pid_follow_link,
 	.setattr	= proc_setattr,
 };
 
-
 /* building an inode */
 
-static int task_dumpable(struct task_struct *task)
-{
-	int dumpable = 0;
-	struct mm_struct *mm;
-
-	task_lock(task);
-	mm = task->mm;
-	if (mm)
-		dumpable = get_dumpable(mm);
-	task_unlock(task);
-	if(dumpable == 1)
-		return 1;
-	return 0;
-}
-
 struct inode *proc_pid_make_inode(struct super_block * sb, struct task_struct *task)
 {
 	struct inode * inode;
@@ -1634,15 +1618,6 @@ int pid_revalidate(struct dentry *dentry
 	return 0;
 }
 
-static int pid_delete_dentry(const struct dentry * dentry)
-{
-	/* Is the task we represent dead?
-	 * If so, then don't put the dentry on the lru list,
-	 * kill it immediately.
-	 */
-	return !proc_pid(dentry->d_inode)->tasks[PIDTYPE_PID].first;
-}
-
 const struct dentry_operations pid_dentry_operations =
 {
 	.d_revalidate	= pid_revalidate,
@@ -1705,300 +1680,6 @@ end_instantiate:
 	return filldir(dirent, name, len, filp->f_pos, ino, type);
 }
 
-static unsigned name_to_int(struct dentry *dentry)
-{
-	const char *name = dentry->d_name.name;
-	int len = dentry->d_name.len;
-	unsigned n = 0;
-
-	if (len > 1 && *name == '0')
-		goto out;
-	while (len-- > 0) {
-		unsigned c = *name++ - '0';
-		if (c > 9)
-			goto out;
-		if (n >= (~0U-9)/10)
-			goto out;
-		n *= 10;
-		n += c;
-	}
-	return n;
-out:
-	return ~0U;
-}
-
-#define PROC_FDINFO_MAX 64
-
-static int proc_fd_info(struct inode *inode, struct path *path, char *info)
-{
-	struct task_struct *task = get_proc_task(inode);
-	struct files_struct *files = NULL;
-	struct file *file;
-	int fd = proc_fd(inode);
-
-	if (task) {
-		files = get_files_struct(task);
-		put_task_struct(task);
-	}
-	if (files) {
-		/*
-		 * We are not taking a ref to the file structure, so we must
-		 * hold ->file_lock.
-		 */
-		spin_lock(&files->file_lock);
-		file = fcheck_files(files, fd);
-		if (file) {
-			unsigned int f_flags;
-			struct fdtable *fdt;
-
-			fdt = files_fdtable(files);
-			f_flags = file->f_flags & ~O_CLOEXEC;
-			if (close_on_exec(fd, fdt))
-				f_flags |= O_CLOEXEC;
-
-			if (path) {
-				*path = file->f_path;
-				path_get(&file->f_path);
-			}
-			if (info)
-				snprintf(info, PROC_FDINFO_MAX,
-					 "pos:\t%lli\n"
-					 "flags:\t0%o\n",
-					 (long long) file->f_pos,
-					 f_flags);
-			spin_unlock(&files->file_lock);
-			put_files_struct(files);
-			return 0;
-		}
-		spin_unlock(&files->file_lock);
-		put_files_struct(files);
-	}
-	return -ENOENT;
-}
-
-static int proc_fd_link(struct dentry *dentry, struct path *path)
-{
-	return proc_fd_info(dentry->d_inode, path, NULL);
-}
-
-static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd)
-{
-	struct inode *inode;
-	struct task_struct *task;
-	int fd;
-	struct files_struct *files;
-	const struct cred *cred;
-
-	if (nd && nd->flags & LOOKUP_RCU)
-		return -ECHILD;
-
-	inode = dentry->d_inode;
-	task = get_proc_task(inode);
-	fd = proc_fd(inode);
-
-	if (task) {
-		files = get_files_struct(task);
-		if (files) {
-			rcu_read_lock();
-			if (fcheck_files(files, fd)) {
-				rcu_read_unlock();
-				put_files_struct(files);
-				if (task_dumpable(task)) {
-					rcu_read_lock();
-					cred = __task_cred(task);
-					inode->i_uid = cred->euid;
-					inode->i_gid = cred->egid;
-					rcu_read_unlock();
-				} else {
-					inode->i_uid = 0;
-					inode->i_gid = 0;
-				}
-				inode->i_mode &= ~(S_ISUID | S_ISGID);
-				security_task_to_inode(task, inode);
-				put_task_struct(task);
-				return 1;
-			}
-			rcu_read_unlock();
-			put_files_struct(files);
-		}
-		put_task_struct(task);
-	}
-	d_drop(dentry);
-	return 0;
-}
-
-static const struct dentry_operations tid_fd_dentry_operations =
-{
-	.d_revalidate	= tid_fd_revalidate,
-	.d_delete	= pid_delete_dentry,
-};
-
-static struct dentry *proc_fd_instantiate(struct inode *dir,
-	struct dentry *dentry, struct task_struct *task, const void *ptr)
-{
-	unsigned fd = *(const unsigned *)ptr;
-	struct file *file;
-	struct files_struct *files;
- 	struct inode *inode;
- 	struct proc_inode *ei;
-	struct dentry *error = ERR_PTR(-ENOENT);
-
-	inode = proc_pid_make_inode(dir->i_sb, task);
-	if (!inode)
-		goto out;
-	ei = PROC_I(inode);
-	ei->fd = fd;
-	files = get_files_struct(task);
-	if (!files)
-		goto out_iput;
-	inode->i_mode = S_IFLNK;
-
-	/*
-	 * We are not taking a ref to the file structure, so we must
-	 * hold ->file_lock.
-	 */
-	spin_lock(&files->file_lock);
-	file = fcheck_files(files, fd);
-	if (!file)
-		goto out_unlock;
-	if (file->f_mode & FMODE_READ)
-		inode->i_mode |= S_IRUSR | S_IXUSR;
-	if (file->f_mode & FMODE_WRITE)
-		inode->i_mode |= S_IWUSR | S_IXUSR;
-	spin_unlock(&files->file_lock);
-	put_files_struct(files);
-
-	inode->i_op = &proc_pid_link_inode_operations;
-	inode->i_size = 64;
-	ei->op.proc_get_link = proc_fd_link;
-	d_set_d_op(dentry, &tid_fd_dentry_operations);
-	d_add(dentry, inode);
-	/* Close the race of the process dying before we return the dentry */
-	if (tid_fd_revalidate(dentry, NULL))
-		error = NULL;
-
- out:
-	return error;
-out_unlock:
-	spin_unlock(&files->file_lock);
-	put_files_struct(files);
-out_iput:
-	iput(inode);
-	goto out;
-}
-
-static struct dentry *proc_lookupfd_common(struct inode *dir,
-					   struct dentry *dentry,
-					   instantiate_t instantiate)
-{
-	struct task_struct *task = get_proc_task(dir);
-	unsigned fd = name_to_int(dentry);
-	struct dentry *result = ERR_PTR(-ENOENT);
-
-	if (!task)
-		goto out_no_task;
-	if (fd == ~0U)
-		goto out;
-
-	result = instantiate(dir, dentry, task, &fd);
-out:
-	put_task_struct(task);
-out_no_task:
-	return result;
-}
-
-static int proc_readfd_common(struct file * filp, void * dirent,
-			      filldir_t filldir, instantiate_t instantiate)
-{
-	struct dentry *dentry = filp->f_path.dentry;
-	struct inode *inode = dentry->d_inode;
-	struct task_struct *p = get_proc_task(inode);
-	unsigned int fd, ino;
-	int retval;
-	struct files_struct * files;
-
-	retval = -ENOENT;
-	if (!p)
-		goto out_no_task;
-	retval = 0;
-
-	fd = filp->f_pos;
-	switch (fd) {
-		case 0:
-			if (filldir(dirent, ".", 1, 0, inode->i_ino, DT_DIR) < 0)
-				goto out;
-			filp->f_pos++;
-		case 1:
-			ino = parent_ino(dentry);
-			if (filldir(dirent, "..", 2, 1, ino, DT_DIR) < 0)
-				goto out;
-			filp->f_pos++;
-		default:
-			files = get_files_struct(p);
-			if (!files)
-				goto out;
-			rcu_read_lock();
-			for (fd = filp->f_pos-2;
-			     fd < files_fdtable(files)->max_fds;
-			     fd++, filp->f_pos++) {
-				char name[PROC_NUMBUF];
-				int len;
-
-				if (!fcheck_files(files, fd))
-					continue;
-				rcu_read_unlock();
-
-				len = snprintf(name, sizeof(name), "%d", fd);
-				if (proc_fill_cache(filp, dirent, filldir,
-						    name, len, instantiate,
-						    p, &fd) < 0) {
-					rcu_read_lock();
-					break;
-				}
-				rcu_read_lock();
-			}
-			rcu_read_unlock();
-			put_files_struct(files);
-	}
-out:
-	put_task_struct(p);
-out_no_task:
-	return retval;
-}
-
-static struct dentry *proc_lookupfd(struct inode *dir, struct dentry *dentry,
-				    struct nameidata *nd)
-{
-	return proc_lookupfd_common(dir, dentry, proc_fd_instantiate);
-}
-
-static int proc_readfd(struct file *filp, void *dirent, filldir_t filldir)
-{
-	return proc_readfd_common(filp, dirent, filldir, proc_fd_instantiate);
-}
-
-static ssize_t proc_fdinfo_read(struct file *file, char __user *buf,
-				      size_t len, loff_t *ppos)
-{
-	char tmp[PROC_FDINFO_MAX];
-	int err = proc_fd_info(file->f_path.dentry->d_inode, NULL, tmp);
-	if (!err)
-		err = simple_read_from_buffer(buf, len, ppos, tmp, strlen(tmp));
-	return err;
-}
-
-static const struct file_operations proc_fdinfo_file_operations = {
-	.open           = nonseekable_open,
-	.read		= proc_fdinfo_read,
-	.llseek		= no_llseek,
-};
-
-static const struct file_operations proc_fd_operations = {
-	.read		= generic_read_dir,
-	.readdir	= proc_readfd,
-	.llseek		= default_llseek,
-};
-
 #ifdef CONFIG_CHECKPOINT_RESTORE
 
 /*
@@ -2344,82 +2025,6 @@ static const struct file_operations proc
 
 #endif /* CONFIG_CHECKPOINT_RESTORE */
 
-/*
- * /proc/pid/fd needs a special permission handler so that a process can still
- * access /proc/self/fd after it has executed a setuid().
- */
-static int proc_fd_permission(struct inode *inode, int mask)
-{
-	int rv = generic_permission(inode, mask);
-	if (rv == 0)
-		return 0;
-	if (task_pid(current) == proc_pid(inode))
-		rv = 0;
-	return rv;
-}
-
-/*
- * proc directories can do almost nothing..
- */
-static const struct inode_operations proc_fd_inode_operations = {
-	.lookup		= proc_lookupfd,
-	.permission	= proc_fd_permission,
-	.setattr	= proc_setattr,
-};
-
-static struct dentry *proc_fdinfo_instantiate(struct inode *dir,
-	struct dentry *dentry, struct task_struct *task, const void *ptr)
-{
-	unsigned fd = *(unsigned *)ptr;
- 	struct inode *inode;
- 	struct proc_inode *ei;
-	struct dentry *error = ERR_PTR(-ENOENT);
-
-	inode = proc_pid_make_inode(dir->i_sb, task);
-	if (!inode)
-		goto out;
-	ei = PROC_I(inode);
-	ei->fd = fd;
-	inode->i_mode = S_IFREG | S_IRUSR;
-	inode->i_fop = &proc_fdinfo_file_operations;
-	d_set_d_op(dentry, &tid_fd_dentry_operations);
-	d_add(dentry, inode);
-	/* Close the race of the process dying before we return the dentry */
-	if (tid_fd_revalidate(dentry, NULL))
-		error = NULL;
-
- out:
-	return error;
-}
-
-static struct dentry *proc_lookupfdinfo(struct inode *dir,
-					struct dentry *dentry,
-					struct nameidata *nd)
-{
-	return proc_lookupfd_common(dir, dentry, proc_fdinfo_instantiate);
-}
-
-static int proc_readfdinfo(struct file *filp, void *dirent, filldir_t filldir)
-{
-	return proc_readfd_common(filp, dirent, filldir,
-				  proc_fdinfo_instantiate);
-}
-
-static const struct file_operations proc_fdinfo_operations = {
-	.read		= generic_read_dir,
-	.readdir	= proc_readfdinfo,
-	.llseek		= default_llseek,
-};
-
-/*
- * proc directories can do almost nothing..
- */
-static const struct inode_operations proc_fdinfo_inode_operations = {
-	.lookup		= proc_lookupfdinfo,
-	.setattr	= proc_setattr,
-};
-
-
 static struct dentry *proc_pident_instantiate(struct inode *dir,
 	struct dentry *dentry, struct task_struct *task, const void *ptr)
 {
Index: linux-2.6.git/fs/proc/fd.c
===================================================================
--- /dev/null
+++ linux-2.6.git/fs/proc/fd.c
@@ -0,0 +1,370 @@
+#include <linux/sched.h>
+#include <linux/errno.h>
+#include <linux/dcache.h>
+#include <linux/path.h>
+#include <linux/fdtable.h>
+#include <linux/namei.h>
+#include <linux/pid.h>
+#include <linux/security.h>
+
+#include <linux/proc_fs.h>
+
+#include "internal.h"
+
+#define PROC_FDINFO_MAX 64
+
+static int proc_fd_info(struct inode *inode, struct path *path, char *info)
+{
+	struct task_struct *task = get_proc_task(inode);
+	struct files_struct *files = NULL;
+	int fd = proc_fd(inode);
+	struct file *file;
+
+	if (task) {
+		files = get_files_struct(task);
+		put_task_struct(task);
+	}
+
+	if (files) {
+		/*
+		 * We are not taking a ref to the file structure,
+		 * so we must hold ->file_lock.
+		 */
+		spin_lock(&files->file_lock);
+		file = fcheck_files(files, fd);
+
+		if (file) {
+			unsigned int f_flags;
+			struct fdtable *fdt;
+
+			fdt = files_fdtable(files);
+			f_flags = file->f_flags & ~O_CLOEXEC;
+			if (close_on_exec(fd, fdt))
+				f_flags |= O_CLOEXEC;
+
+			if (path) {
+				*path = file->f_path;
+				path_get(&file->f_path);
+			}
+
+			if (info)
+				snprintf(info, PROC_FDINFO_MAX,
+					 "pos:\t%lli\n"
+					 "flags:\t0%o\n",
+					 (long long)file->f_pos,
+					 f_flags);
+
+			spin_unlock(&files->file_lock);
+			put_files_struct(files);
+			return 0;
+		}
+
+		spin_unlock(&files->file_lock);
+		put_files_struct(files);
+	}
+
+	return -ENOENT;
+}
+
+static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd)
+{
+	struct files_struct *files;
+	struct task_struct *task;
+	const struct cred *cred;
+	struct inode *inode;
+	int fd;
+
+	if (nd && nd->flags & LOOKUP_RCU)
+		return -ECHILD;
+
+	inode = dentry->d_inode;
+	task = get_proc_task(inode);
+	fd = proc_fd(inode);
+
+	if (task) {
+		files = get_files_struct(task);
+		if (files) {
+			rcu_read_lock();
+			if (fcheck_files(files, fd)) {
+				rcu_read_unlock();
+				put_files_struct(files);
+				if (task_dumpable(task)) {
+					rcu_read_lock();
+					cred = __task_cred(task);
+					inode->i_uid = cred->euid;
+					inode->i_gid = cred->egid;
+					rcu_read_unlock();
+				} else {
+					inode->i_uid = 0;
+					inode->i_gid = 0;
+				}
+				inode->i_mode &= ~(S_ISUID | S_ISGID);
+				security_task_to_inode(task, inode);
+				put_task_struct(task);
+				return 1;
+			}
+			rcu_read_unlock();
+			put_files_struct(files);
+		}
+		put_task_struct(task);
+	}
+
+	d_drop(dentry);
+	return 0;
+}
+
+static const struct dentry_operations tid_fd_dentry_operations = {
+	.d_revalidate	= tid_fd_revalidate,
+	.d_delete	= pid_delete_dentry,
+};
+
+static int proc_fd_link(struct dentry *dentry, struct path *path)
+{
+	return proc_fd_info(dentry->d_inode, path, NULL);
+}
+
+static struct dentry *
+proc_fd_instantiate(struct inode *dir, struct dentry *dentry,
+		    struct task_struct *task, const void *ptr)
+{
+	struct dentry *error = ERR_PTR(-ENOENT);
+	unsigned fd = *(const unsigned *)ptr;
+	struct files_struct *files;
+	struct proc_inode *ei;
+	struct inode *inode;
+	struct file *file;
+
+	inode = proc_pid_make_inode(dir->i_sb, task);
+	if (!inode)
+		goto out;
+
+	ei = PROC_I(inode);
+	ei->fd = fd;
+
+	files = get_files_struct(task);
+	if (!files)
+		goto out_iput;
+
+	inode->i_mode = S_IFLNK;
+
+	/*
+	 * We are not taking a ref to the file structure, so we must
+	 * hold ->file_lock.
+	 */
+	spin_lock(&files->file_lock);
+	file = fcheck_files(files, fd);
+	if (!file)
+		goto out_unlock;
+	if (file->f_mode & FMODE_READ)
+		inode->i_mode |= S_IRUSR | S_IXUSR;
+	if (file->f_mode & FMODE_WRITE)
+		inode->i_mode |= S_IWUSR | S_IXUSR;
+	spin_unlock(&files->file_lock);
+	put_files_struct(files);
+
+	inode->i_op = &proc_pid_link_inode_operations;
+	inode->i_size = 64;
+
+	ei->op.proc_get_link = proc_fd_link;
+
+	d_set_d_op(dentry, &tid_fd_dentry_operations);
+	d_add(dentry, inode);
+
+	/* Close the race of the process dying before we return the dentry */
+	if (tid_fd_revalidate(dentry, NULL))
+		error = NULL;
+out:
+	return error;
+
+out_unlock:
+	spin_unlock(&files->file_lock);
+	put_files_struct(files);
+out_iput:
+	iput(inode);
+	goto out;
+}
+
+static struct dentry *
+proc_lookupfd_common(struct inode *dir, struct dentry *dentry,
+		     instantiate_t instantiate)
+{
+	struct task_struct *task = get_proc_task(dir);
+	struct dentry *result = ERR_PTR(-ENOENT);
+	unsigned fd = name_to_int(dentry);
+
+	if (!task)
+		goto out_no_task;
+	if (fd == ~0U)
+		goto out;
+
+	result = instantiate(dir, dentry, task, &fd);
+out:
+	put_task_struct(task);
+out_no_task:
+	return result;
+}
+
+static int proc_readfd_common(struct file * filp, void * dirent,
+			      filldir_t filldir, instantiate_t instantiate)
+{
+	struct dentry *dentry = filp->f_path.dentry;
+	struct inode *inode = dentry->d_inode;
+	struct task_struct *p = get_proc_task(inode);
+	struct files_struct *files;
+	unsigned int fd, ino;
+	int retval;
+
+	retval = -ENOENT;
+	if (!p)
+		goto out_no_task;
+	retval = 0;
+
+	fd = filp->f_pos;
+	switch (fd) {
+		case 0:
+			if (filldir(dirent, ".", 1, 0, inode->i_ino, DT_DIR) < 0)
+				goto out;
+			filp->f_pos++;
+		case 1:
+			ino = parent_ino(dentry);
+			if (filldir(dirent, "..", 2, 1, ino, DT_DIR) < 0)
+				goto out;
+			filp->f_pos++;
+		default:
+			files = get_files_struct(p);
+			if (!files)
+				goto out;
+			rcu_read_lock();
+			for (fd = filp->f_pos - 2;
+			     fd < files_fdtable(files)->max_fds;
+			     fd++, filp->f_pos++) {
+				char name[PROC_NUMBUF];
+				int len;
+
+				if (!fcheck_files(files, fd))
+					continue;
+				rcu_read_unlock();
+
+				len = snprintf(name, sizeof(name), "%d", fd);
+				if (proc_fill_cache(filp, dirent, filldir,
+						    name, len, instantiate,
+						    p, &fd) < 0) {
+					rcu_read_lock();
+					break;
+				}
+				rcu_read_lock();
+			}
+			rcu_read_unlock();
+			put_files_struct(files);
+	}
+out:
+	put_task_struct(p);
+out_no_task:
+	return retval;
+}
+
+static ssize_t proc_fdinfo_read(struct file *file, char __user *buf,
+				      size_t len, loff_t *ppos)
+{
+	char tmp[PROC_FDINFO_MAX];
+	int err = proc_fd_info(file->f_path.dentry->d_inode, NULL, tmp);
+	if (!err)
+		err = simple_read_from_buffer(buf, len, ppos, tmp, strlen(tmp));
+	return err;
+}
+
+static const struct file_operations proc_fdinfo_file_operations = {
+	.open           = nonseekable_open,
+	.read		= proc_fdinfo_read,
+	.llseek		= no_llseek,
+};
+
+static int proc_readfd(struct file *filp, void *dirent, filldir_t filldir)
+{
+	return proc_readfd_common(filp, dirent, filldir, proc_fd_instantiate);
+}
+
+const struct file_operations proc_fd_operations = {
+	.read		= generic_read_dir,
+	.readdir	= proc_readfd,
+	.llseek		= default_llseek,
+};
+
+static struct dentry *proc_lookupfd(struct inode *dir, struct dentry *dentry,
+				    struct nameidata *nd)
+{
+	return proc_lookupfd_common(dir, dentry, proc_fd_instantiate);
+}
+
+/*
+ * /proc/pid/fd needs a special permission handler so that a process can still
+ * access /proc/self/fd after it has executed a setuid().
+ */
+int proc_fd_permission(struct inode *inode, int mask)
+{
+	int rv = generic_permission(inode, mask);
+	if (rv == 0)
+		return 0;
+	if (task_pid(current) == proc_pid(inode))
+		rv = 0;
+	return rv;
+}
+
+const struct inode_operations proc_fd_inode_operations = {
+	.lookup		= proc_lookupfd,
+	.permission	= proc_fd_permission,
+	.setattr	= proc_setattr,
+};
+
+static struct dentry *
+proc_fdinfo_instantiate(struct inode *dir, struct dentry *dentry,
+			struct task_struct *task, const void *ptr)
+{
+	struct dentry *error = ERR_PTR(-ENOENT);
+	unsigned fd = *(unsigned *)ptr;
+	struct proc_inode *ei;
+	struct inode *inode;
+
+	inode = proc_pid_make_inode(dir->i_sb, task);
+	if (!inode)
+		goto out;
+
+	ei = PROC_I(inode);
+	ei->fd = fd;
+
+	inode->i_mode = S_IFREG | S_IRUSR;
+	inode->i_fop = &proc_fdinfo_file_operations;
+
+	d_set_d_op(dentry, &tid_fd_dentry_operations);
+	d_add(dentry, inode);
+
+	/* Close the race of the process dying before we return the dentry */
+	if (tid_fd_revalidate(dentry, NULL))
+		error = NULL;
+
+ out:
+	return error;
+}
+
+static struct dentry *
+proc_lookupfdinfo(struct inode *dir, struct dentry *dentry, struct nameidata *nd)
+{
+	return proc_lookupfd_common(dir, dentry, proc_fdinfo_instantiate);
+}
+
+static int proc_readfdinfo(struct file *filp, void *dirent, filldir_t filldir)
+{
+	return proc_readfd_common(filp, dirent, filldir,
+				  proc_fdinfo_instantiate);
+}
+
+const struct inode_operations proc_fdinfo_inode_operations = {
+	.lookup		= proc_lookupfdinfo,
+	.setattr	= proc_setattr,
+};
+
+const struct file_operations proc_fdinfo_operations = {
+	.read		= generic_read_dir,
+	.readdir	= proc_readfdinfo,
+	.llseek		= default_llseek,
+};
Index: linux-2.6.git/fs/proc/fd.h
===================================================================
--- /dev/null
+++ linux-2.6.git/fs/proc/fd.h
@@ -0,0 +1,14 @@
+#ifndef __PROCFS_FD_H__
+#define __PROCFS_FD_H__
+
+#include <linux/fs.h>
+
+extern const struct file_operations proc_fd_operations;
+extern const struct inode_operations proc_fd_inode_operations;
+
+extern const struct file_operations proc_fdinfo_operations;
+extern const struct inode_operations proc_fdinfo_inode_operations;
+
+extern int proc_fd_permission(struct inode *inode, int mask);
+
+#endif /* __PROCFS_FD_H__ */
Index: linux-2.6.git/fs/proc/internal.h
===================================================================
--- linux-2.6.git.orig/fs/proc/internal.h
+++ linux-2.6.git/fs/proc/internal.h
@@ -8,7 +8,7 @@
  * as published by the Free Software Foundation; either version
  * 2 of the License, or (at your option) any later version.
  */
-
+#include <linux/sched.h>
 #include <linux/proc_fs.h>
 struct  ctl_table_header;
 
@@ -67,6 +67,7 @@ extern const struct file_operations proc
 extern const struct file_operations proc_pagemap_operations;
 extern const struct file_operations proc_net_operations;
 extern const struct inode_operations proc_net_inode_operations;
+extern const struct inode_operations proc_pid_link_inode_operations;
 
 struct proc_maps_private {
 	struct pid *pid;
@@ -93,6 +94,52 @@ static inline int proc_fd(struct inode *
 	return PROC_I(inode)->fd;
 }
 
+static inline int task_dumpable(struct task_struct *task)
+{
+	int dumpable = 0;
+	struct mm_struct *mm;
+
+	task_lock(task);
+	mm = task->mm;
+	if (mm)
+		dumpable = get_dumpable(mm);
+	task_unlock(task);
+	if(dumpable == 1)
+		return 1;
+	return 0;
+}
+
+static inline int pid_delete_dentry(const struct dentry * dentry)
+{
+	/* Is the task we represent dead?
+	 * If so, then don't put the dentry on the lru list,
+	 * kill it immediately.
+	 */
+	return !proc_pid(dentry->d_inode)->tasks[PIDTYPE_PID].first;
+}
+
+static inline unsigned name_to_int(struct dentry *dentry)
+{
+	const char *name = dentry->d_name.name;
+	int len = dentry->d_name.len;
+	unsigned n = 0;
+
+	if (len > 1 && *name == '0')
+		goto out;
+	while (len-- > 0) {
+		unsigned c = *name++ - '0';
+		if (c > 9)
+			goto out;
+		if (n >= (~0U-9)/10)
+			goto out;
+		n *= 10;
+		n += c;
+	}
+	return n;
+out:
+	return ~0U;
+}
+
 struct dentry *proc_lookup_de(struct proc_dir_entry *de, struct inode *ino,
 		struct dentry *dentry);
 int proc_readdir_de(struct proc_dir_entry *de, struct file *filp, void *dirent,


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [rfc 2/4] procfs: Convert /proc/pid/fdinfo/fd handling routines into seq-file with fdinfo helpers
  2012-05-17 16:07 [rfc 0/4] procfs fdinfo extension Cyrill Gorcunov
  2012-05-17 16:07 ` [rfc 1/4] procfs: Move /proc/pid/fd[info] handling code to fd.[ch] Cyrill Gorcunov
@ 2012-05-17 16:07 ` Cyrill Gorcunov
  2012-05-17 16:32   ` Pavel Emelyanov
  2012-05-17 16:07 ` [rfc 3/4] fs, eventfd: Add procfs fdinfo helper Cyrill Gorcunov
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Cyrill Gorcunov @ 2012-05-17 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Pavel Emelyanov, James Bottomley, linux-fsdevel,
	Cyrill Gorcunov

[-- Attachment #1: seq-fdinfo-seq-ops-2 --]
[-- Type: text/plain, Size: 9469 bytes --]

This patch converts /proc/pid/fdinfo/fd handling routines to seq-files with
ability to plug in additional fdinfo helpers from other subsystems.

For example files created in eventfd/eventpoll/inotify subsystems might print
out additional information, not just file position and flags.

Still if there no fdinfo helpers registered then the output is well known
pos, flags pair.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
 fs/proc/fd.c            |  248 +++++++++++++++++++++++++++++++++++++-----------
 fs/proc/fd.h            |    1 
 include/linux/proc_fs.h |   31 ++++++
 3 files changed, 224 insertions(+), 56 deletions(-)

Index: linux-2.6.git/fs/proc/fd.c
===================================================================
--- linux-2.6.git.orig/fs/proc/fd.c
+++ linux-2.6.git/fs/proc/fd.c
@@ -6,66 +6,194 @@
 #include <linux/namei.h>
 #include <linux/pid.h>
 #include <linux/security.h>
+#include <linux/file.h>
+#include <linux/seq_file.h>
+#include <linux/spinlock.h>
 
 #include <linux/proc_fs.h>
 
 #include "internal.h"
+#include "fd.h"
 
-#define PROC_FDINFO_MAX 64
+static LIST_HEAD(fdinfo_helpers);
+static DEFINE_RWLOCK(fdinfo_helpers_lock);
 
-static int proc_fd_info(struct inode *inode, struct path *path, char *info)
+int proc_register_fdinfo_helper(struct proc_fdinfo_helper *helper)
 {
-	struct task_struct *task = get_proc_task(inode);
-	struct files_struct *files = NULL;
-	int fd = proc_fd(inode);
-	struct file *file;
+	struct proc_fdinfo_helper *item;
+	int ret = 0;
 
-	if (task) {
-		files = get_files_struct(task);
-		put_task_struct(task);
+	if (!helper->ops || !helper->probe)
+		return -EINVAL;
+
+	write_lock(&fdinfo_helpers_lock);
+	list_for_each_entry(item, &fdinfo_helpers, list) {
+		if (item == helper) {
+			pr_err("procfs fdinfo helper `%s' is already registered\n",
+			       item->name);
+			ret = -EINVAL;
+			break;
+		}
 	}
+	if (!ret)
+		list_add(&helper->list, &fdinfo_helpers);
+	write_unlock(&fdinfo_helpers_lock);
+	return ret;
+}
 
-	if (files) {
-		/*
-		 * We are not taking a ref to the file structure,
-		 * so we must hold ->file_lock.
-		 */
-		spin_lock(&files->file_lock);
-		file = fcheck_files(files, fd);
+void proc_unregister_fdinfo_helper(struct proc_fdinfo_helper *helper)
+{
+	struct proc_fdinfo_helper *item;
 
-		if (file) {
-			unsigned int f_flags;
-			struct fdtable *fdt;
+	write_lock(&fdinfo_helpers_lock);
+	list_for_each_entry(item, &fdinfo_helpers, list) {
+		if (item == helper) {
+			list_del(&item->list);
+			break;
+		}
+	}
+	write_unlock(&fdinfo_helpers_lock);
+}
 
-			fdt = files_fdtable(files);
-			f_flags = file->f_flags & ~O_CLOEXEC;
-			if (close_on_exec(fd, fdt))
-				f_flags |= O_CLOEXEC;
-
-			if (path) {
-				*path = file->f_path;
-				path_get(&file->f_path);
-			}
+static void assign_fdinfo_helper(struct proc_fdinfo_extra *extra)
+{
+	struct proc_fdinfo_helper *item;
+	read_lock(&fdinfo_helpers_lock);
+	list_for_each_entry(item, &fdinfo_helpers, list) {
+		if (item->probe(extra->fd_file)) {
+			extra->helper = item;
+			break;
+		}
+	}
+	read_unlock(&fdinfo_helpers_lock);
+}
+
+static void *seq_start(struct seq_file *m, loff_t *pos)
+{
+	struct proc_fdinfo_extra *extra = m->private;
+
+	read_lock(&fdinfo_helpers_lock);
+
+	if (*pos > 0)
+		extra->hdr_shown = true;
+
+	return *pos == 0 ? extra :
+		(extra->helper ? extra->helper->ops->start(m, pos) : NULL);
+}
+
+static void seq_stop(struct seq_file *m, void *v)
+{
+	struct proc_fdinfo_extra *extra = m->private;
+
+	if (extra->helper && extra->hdr_shown)
+		extra->helper->ops->stop(m, v);
+
+	read_unlock(&fdinfo_helpers_lock);
+}
+
+static void *seq_next(struct seq_file *m, void *p, loff_t *pos)
+{
+	struct proc_fdinfo_extra *extra = m->private;
+
+	if (extra->helper)
+		return extra->helper->ops->next(m, p, pos);
+
+	++*pos;
+	return NULL;
+}
+
+static int seq_show(struct seq_file *m, void *v)
+{
+	struct proc_fdinfo_extra *extra = m->private;
+
+	if (extra->helper && extra->hdr_shown)
+		return extra->helper->ops->show(m, v);
+
+	seq_printf(m, "pos:\t%lli\nflags:\t0%o\n",
+		   (long long)extra->fd_file->f_pos,
+		   extra->f_flags);
+
+	extra->hdr_shown = 1;
+	return 0;
+}
+
+static const struct seq_operations fdinfo_seq_ops = {
+	.start	= seq_start,
+	.next	= seq_next,
+	.stop	= seq_stop,
+	.show	= seq_show,
+};
 
-			if (info)
-				snprintf(info, PROC_FDINFO_MAX,
-					 "pos:\t%lli\n"
-					 "flags:\t0%o\n",
-					 (long long)file->f_pos,
-					 f_flags);
+static int seq_fdinfo_open(struct inode *inode, struct file *file)
+{
+	struct files_struct *files = NULL;
+	struct proc_fdinfo_extra *extra;
+	struct task_struct *task;
+	struct seq_file *m;
+	int ret;
+
+	extra = kzalloc(sizeof(*extra), GFP_KERNEL);
+	if (!extra)
+		return -ENOMEM;
+	extra->inode = inode;
+
+	ret = seq_open(file, &fdinfo_seq_ops);
+	if (!ret) {
+		ret = -ENOENT;
+		m = file->private_data;
+		m->private = extra;
+
+		task = get_proc_task(inode);
+		if (task) {
+			files = get_files_struct(task);
+			put_task_struct(task);
+		}
+
+		if (files) {
+			int fd = proc_fd(inode);
 
+			spin_lock(&files->file_lock);
+			extra->fd_file = fcheck_files(files, fd);
+			if (extra->fd_file) {
+				struct fdtable *fdt = files_fdtable(files);
+
+				extra->f_flags = extra->fd_file->f_flags & ~O_CLOEXEC;
+				if (close_on_exec(fd, fdt))
+					extra->f_flags |= O_CLOEXEC;
+				get_file(extra->fd_file);
+				ret = 0;
+			}
 			spin_unlock(&files->file_lock);
 			put_files_struct(files);
-			return 0;
-		}
 
-		spin_unlock(&files->file_lock);
-		put_files_struct(files);
+			if (extra->fd_file)
+				assign_fdinfo_helper(extra);
+		}
 	}
 
-	return -ENOENT;
+	if (ret)
+		kfree(extra);
+	return ret;
 }
 
+static int seq_fdinfo_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *m = file->private_data;
+	struct proc_fdinfo_extra *extra = m->private;
+
+	put_filp(extra->fd_file);
+	kfree(m->private);
+
+	return seq_release(inode, file);
+}
+
+static const struct file_operations proc_fdinfo_file_operations = {
+	.open		= seq_fdinfo_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_fdinfo_release,
+};
+
 static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd)
 {
 	struct files_struct *files;
@@ -120,7 +248,31 @@ static const struct dentry_operations ti
 
 static int proc_fd_link(struct dentry *dentry, struct path *path)
 {
-	return proc_fd_info(dentry->d_inode, path, NULL);
+	struct inode *inode = dentry->d_inode;
+	struct task_struct *task = get_proc_task(inode);
+	struct files_struct *files = NULL;
+	int fd = proc_fd(inode);
+	struct file *file;
+	int err = -ENOENT;
+
+	if (task) {
+		files = get_files_struct(task);
+		put_task_struct(task);
+	}
+
+	if (files) {
+		spin_lock(&files->file_lock);
+		file = fcheck_files(files, fd);
+		if (file) {
+			*path = file->f_path;
+			path_get(&file->f_path);
+		}
+		spin_unlock(&files->file_lock);
+		put_files_struct(files);
+		err = 0;
+	}
+
+	return err;
 }
 
 static struct dentry *
@@ -263,22 +415,6 @@ out_no_task:
 	return retval;
 }
 
-static ssize_t proc_fdinfo_read(struct file *file, char __user *buf,
-				      size_t len, loff_t *ppos)
-{
-	char tmp[PROC_FDINFO_MAX];
-	int err = proc_fd_info(file->f_path.dentry->d_inode, NULL, tmp);
-	if (!err)
-		err = simple_read_from_buffer(buf, len, ppos, tmp, strlen(tmp));
-	return err;
-}
-
-static const struct file_operations proc_fdinfo_file_operations = {
-	.open           = nonseekable_open,
-	.read		= proc_fdinfo_read,
-	.llseek		= no_llseek,
-};
-
 static int proc_readfd(struct file *filp, void *dirent, filldir_t filldir)
 {
 	return proc_readfd_common(filp, dirent, filldir, proc_fd_instantiate);
Index: linux-2.6.git/fs/proc/fd.h
===================================================================
--- linux-2.6.git.orig/fs/proc/fd.h
+++ linux-2.6.git/fs/proc/fd.h
@@ -1,6 +1,7 @@
 #ifndef __PROCFS_FD_H__
 #define __PROCFS_FD_H__
 
+#include <linux/types.h>
 #include <linux/fs.h>
 
 extern const struct file_operations proc_fd_operations;
Index: linux-2.6.git/include/linux/proc_fs.h
===================================================================
--- linux-2.6.git.orig/include/linux/proc_fs.h
+++ linux-2.6.git/include/linux/proc_fs.h
@@ -100,6 +100,25 @@ struct vmcore {
 	loff_t offset;
 };
 
+struct seq_operations;
+struct proc_fdinfo_helper {
+	struct list_head		list;
+	const char			*name;
+	const struct seq_operations	*ops;
+	int				(*probe)(struct file *file);
+	int				(*init)(void *private);
+	void				(*fini)(void *private);
+};
+
+struct proc_fdinfo_extra {
+	struct proc_fdinfo_helper	*helper;
+	struct inode			*inode;
+	struct file			*fd_file;
+	unsigned int			f_flags;
+	void				*private;
+	int				hdr_shown;
+};
+
 #ifdef CONFIG_PROC_FS
 
 extern void proc_root_init(void);
@@ -175,6 +194,9 @@ extern struct proc_dir_entry *proc_net_m
 
 extern struct file *proc_ns_fget(int fd);
 
+extern int proc_register_fdinfo_helper(struct proc_fdinfo_helper *helper);
+extern void proc_unregister_fdinfo_helper(struct proc_fdinfo_helper *helper);
+
 #else
 
 #define proc_net_fops_create(net, name, mode, fops)  ({ (void)(mode), NULL; })
@@ -229,6 +251,15 @@ static inline struct file *proc_ns_fget(
 	return ERR_PTR(-EINVAL);
 }
 
+static inline int proc_register_fdinfo_helper(struct proc_fdinfo_helper *helper)
+{
+	return -EINVAL;
+}
+
+static inline void proc_unregister_fdinfo_helper(struct proc_fdinfo_helper *helper)
+{
+}
+
 #endif /* CONFIG_PROC_FS */
 
 #if !defined(CONFIG_PROC_KCORE)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [rfc 3/4] fs, eventfd: Add procfs fdinfo helper
  2012-05-17 16:07 [rfc 0/4] procfs fdinfo extension Cyrill Gorcunov
  2012-05-17 16:07 ` [rfc 1/4] procfs: Move /proc/pid/fd[info] handling code to fd.[ch] Cyrill Gorcunov
  2012-05-17 16:07 ` [rfc 2/4] procfs: Convert /proc/pid/fdinfo/fd handling routines into seq-file with fdinfo helpers Cyrill Gorcunov
@ 2012-05-17 16:07 ` Cyrill Gorcunov
  2012-05-17 16:34   ` Pavel Emelyanov
  2012-05-17 16:07 ` [rfc 4/4] fs, epoll: " Cyrill Gorcunov
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Cyrill Gorcunov @ 2012-05-17 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Pavel Emelyanov, James Bottomley, linux-fsdevel,
	Cyrill Gorcunov

[-- Attachment #1: seq-fdinfo-eventfd-2 --]
[-- Type: text/plain, Size: 2039 bytes --]

This allow us to print out raw counter value.
So the /proc/pid/fdinfo/fd look like

 | pos:	0
 | flags:	04002
 | eventfd-count:               5a

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
 fs/eventfd.c |   57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

Index: linux-2.6.git/fs/eventfd.c
===================================================================
--- linux-2.6.git.orig/fs/eventfd.c
+++ linux-2.6.git/fs/eventfd.c
@@ -19,6 +19,8 @@
 #include <linux/export.h>
 #include <linux/kref.h>
 #include <linux/eventfd.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
 
 struct eventfd_ctx {
 	struct kref kref;
@@ -437,3 +439,58 @@ SYSCALL_DEFINE1(eventfd, unsigned int, c
 	return sys_eventfd2(count, 0);
 }
 
+#ifdef CONFIG_PROC_FS
+
+static void *seq_start(struct seq_file *m, loff_t *pos)
+{
+	struct proc_fdinfo_extra *extra = m->private;
+	return *pos == 1 ? extra->fd_file : NULL;
+}
+
+static void seq_stop(struct seq_file *m, void *v)
+{
+}
+
+static void *seq_next(struct seq_file *m, void *p, loff_t *pos)
+{
+	struct proc_fdinfo_extra *extra = m->private;
+	return ++*pos == 1 ? extra->fd_file : NULL;
+}
+
+static int seq_show(struct seq_file *m, void *v)
+{
+	struct eventfd_ctx *ctx = ((struct file *)v)->private_data;
+
+	spin_lock_irq(&ctx->wqh.lock);
+	seq_printf(m, "eventfd-count: %16llx\n",
+		   (unsigned long long)ctx->count);
+	spin_unlock_irq(&ctx->wqh.lock);
+
+	return 0;
+}
+
+static const struct seq_operations eventfd_fdinfo_ops = {
+	.start	= seq_start,
+	.next	= seq_next,
+	.stop	= seq_stop,
+	.show	= seq_show,
+};
+
+static int eventfd_fdinfo_probe(struct file *file)
+{
+	return file->f_op == &eventfd_fops;
+}
+
+static struct proc_fdinfo_helper eventfd_fdinfo_helper = {
+	.name	= "evetfd",
+	.ops	= &eventfd_fdinfo_ops,
+	.probe	= eventfd_fdinfo_probe,
+};
+
+static int __init eventfd_init(void)
+{
+	return proc_register_fdinfo_helper(&eventfd_fdinfo_helper);
+}
+fs_initcall(eventfd_init);
+
+#endif /* CONFIG_PROC_FS */


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [rfc 4/4] fs, epoll: Add procfs fdinfo helper
  2012-05-17 16:07 [rfc 0/4] procfs fdinfo extension Cyrill Gorcunov
                   ` (2 preceding siblings ...)
  2012-05-17 16:07 ` [rfc 3/4] fs, eventfd: Add procfs fdinfo helper Cyrill Gorcunov
@ 2012-05-17 16:07 ` Cyrill Gorcunov
  2012-05-17 16:29 ` [rfc 0/4] procfs fdinfo extension Pavel Emelyanov
  2012-05-17 19:05 ` Andrew Morton
  5 siblings, 0 replies; 14+ messages in thread
From: Cyrill Gorcunov @ 2012-05-17 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Pavel Emelyanov, James Bottomley, linux-fsdevel,
	Cyrill Gorcunov

[-- Attachment #1: seq-fdinfo-eventpoll-2 --]
[-- Type: text/plain, Size: 3076 bytes --]

This allow us to print out eventpoll target file descriptor,
events and data, so the /proc/pid/fdinfo/fd consists of

 | pos:	0
 | flags:	02
 | tfd:        5 events:       1d data: ffffffffffffffff

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
 fs/eventpoll.c |   82 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

Index: linux-2.6.git/fs/eventpoll.c
===================================================================
--- linux-2.6.git.orig/fs/eventpoll.c
+++ linux-2.6.git/fs/eventpoll.c
@@ -37,6 +37,8 @@
 #include <asm/io.h>
 #include <asm/mman.h>
 #include <linux/atomic.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
 
 /*
  * LOCKING:
@@ -1815,6 +1817,84 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd,
 
 #endif /* HAVE_SET_RESTORE_SIGMASK */
 
+#ifdef CONFIG_PROC_FS
+
+#define FDINFO_SEQ_UNLOCKED	(void *)(0L)
+#define FDINFO_SEQ_LOCKED	(void *)(1L)
+
+static struct epitem *
+seq_lookup_epi(struct proc_fdinfo_extra *extra, struct eventpoll *ep, loff_t num)
+{
+	struct rb_node *rbp;
+
+	if (extra->private == FDINFO_SEQ_UNLOCKED) {
+		mutex_lock(&ep->mtx);
+		extra->private = FDINFO_SEQ_LOCKED;
+	}
+
+	for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) {
+		if (num-- == 0)
+			return rb_entry(rbp, struct epitem, rbn);
+	}
+
+	return NULL;
+}
+
+static void *seq_start(struct seq_file *m, loff_t *pos)
+{
+	struct proc_fdinfo_extra *extra = m->private;
+	struct eventpoll *ep = extra->fd_file->private_data;
+
+	return *pos ? seq_lookup_epi(extra, ep, *pos - 1) : NULL;
+}
+
+static void seq_stop(struct seq_file *m, void *v)
+{
+	struct proc_fdinfo_extra *extra = m->private;
+	struct eventpoll *ep = extra->fd_file->private_data;
+
+	if (extra->private == FDINFO_SEQ_LOCKED)
+		mutex_unlock(&ep->mtx);
+	extra->private = FDINFO_SEQ_UNLOCKED;
+}
+
+static void *seq_next(struct seq_file *m, void *p, loff_t *pos)
+{
+	struct proc_fdinfo_extra *extra = m->private;
+	struct eventpoll *ep = extra->fd_file->private_data;
+	++*pos;
+	return (void *)seq_lookup_epi(extra, ep, *pos - 1);
+}
+
+static int seq_show(struct seq_file *m, void *v)
+{
+	struct epitem *epi = v;
+	seq_printf(m, "tfd: %8d events: %8x data: %16llx\n",
+		   epi->ffd.fd, epi->event.events, (long long)epi->event.data);
+	return 0;
+}
+
+static const struct seq_operations ep_fdinfo_ops = {
+	.start	= seq_start,
+	.next	= seq_next,
+	.stop	= seq_stop,
+	.show	= seq_show,
+};
+
+static struct proc_fdinfo_helper ep_fdinfo_helper = {
+	.name	= "epoll",
+	.ops	= &ep_fdinfo_ops,
+	.probe	= is_file_epoll,
+};
+
+static int __init ep_register_fdinfo_helper(void)
+{
+	return proc_register_fdinfo_helper(&ep_fdinfo_helper);
+}
+#else
+static void ep_register_fdinfo_helper(void) { }
+#endif /* CONFIG_PROC_FS */
+
 static int __init eventpoll_init(void)
 {
 	struct sysinfo si;
@@ -1847,6 +1927,8 @@ static int __init eventpoll_init(void)
 	pwq_cache = kmem_cache_create("eventpoll_pwq",
 			sizeof(struct eppoll_entry), 0, SLAB_PANIC, NULL);
 
+	ep_register_fdinfo_helper();
+
 	return 0;
 }
 fs_initcall(eventpoll_init);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [rfc 0/4] procfs fdinfo extension
  2012-05-17 16:07 [rfc 0/4] procfs fdinfo extension Cyrill Gorcunov
                   ` (3 preceding siblings ...)
  2012-05-17 16:07 ` [rfc 4/4] fs, epoll: " Cyrill Gorcunov
@ 2012-05-17 16:29 ` Pavel Emelyanov
  2012-05-17 19:05 ` Andrew Morton
  5 siblings, 0 replies; 14+ messages in thread
From: Pavel Emelyanov @ 2012-05-17 16:29 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, Andrew Morton, James Bottomley, linux-fsdevel

> (I've been said that idea of extending fdinfo to provide
>  additional data snippets was approved on LSF this year.
>  Am I right, Pavel?)

Yes, you are.

> Any comments and complains are highly appreciated!
> 
> 	Cyrill
> .
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [rfc 2/4] procfs: Convert /proc/pid/fdinfo/fd handling routines into seq-file with fdinfo helpers
  2012-05-17 16:07 ` [rfc 2/4] procfs: Convert /proc/pid/fdinfo/fd handling routines into seq-file with fdinfo helpers Cyrill Gorcunov
@ 2012-05-17 16:32   ` Pavel Emelyanov
  2012-05-17 17:38     ` Cyrill Gorcunov
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Emelyanov @ 2012-05-17 16:32 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, Andrew Morton, James Bottomley, linux-fsdevel

On 05/17/2012 08:07 PM, Cyrill Gorcunov wrote:
> This patch converts /proc/pid/fdinfo/fd handling routines to seq-files with
> ability to plug in additional fdinfo helpers from other subsystems.
> 
> For example files created in eventfd/eventpoll/inotify subsystems might print
> out additional information, not just file position and flags.
> 
> Still if there no fdinfo helpers registered then the output is well known
> pos, flags pair.
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>

I would split this into two parts -- the 1st one turns fdinfo into seq-files
and the 2nd one adds the fdinfo helpers. This will make the review MUCH simpler.

Thanks,
Pavel


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [rfc 3/4] fs, eventfd: Add procfs fdinfo helper
  2012-05-17 16:07 ` [rfc 3/4] fs, eventfd: Add procfs fdinfo helper Cyrill Gorcunov
@ 2012-05-17 16:34   ` Pavel Emelyanov
  2012-05-17 17:43     ` Cyrill Gorcunov
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Emelyanov @ 2012-05-17 16:34 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, Andrew Morton, James Bottomley, linux-fsdevel

> +static void *seq_start(struct seq_file *m, loff_t *pos)
> +{
> +	struct proc_fdinfo_extra *extra = m->private;
> +	return *pos == 1 ? extra->fd_file : NULL;
> +}
> +
> +static void seq_stop(struct seq_file *m, void *v)
> +{
> +}
> +
> +static void *seq_next(struct seq_file *m, void *p, loff_t *pos)
> +{
> +	struct proc_fdinfo_extra *extra = m->private;
> +	return ++*pos == 1 ? extra->fd_file : NULL;
> +}
> +

<snip>

> +static const struct seq_operations eventfd_fdinfo_ops = {
> +	.start	= seq_start,
> +	.next	= seq_next,
> +	.stop	= seq_stop,
> +	.show	= seq_show,

I think, you can use the single_ part of the seq files engine.

> +};

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [rfc 2/4] procfs: Convert /proc/pid/fdinfo/fd handling routines into seq-file with fdinfo helpers
  2012-05-17 16:32   ` Pavel Emelyanov
@ 2012-05-17 17:38     ` Cyrill Gorcunov
  2012-05-17 21:49       ` Cyrill Gorcunov
  0 siblings, 1 reply; 14+ messages in thread
From: Cyrill Gorcunov @ 2012-05-17 17:38 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: linux-kernel, Andrew Morton, James Bottomley, linux-fsdevel

On Thu, May 17, 2012 at 08:32:04PM +0400, Pavel Emelyanov wrote:
> 
> I would split this into two parts -- the 1st one turns fdinfo into seq-files
> and the 2nd one adds the fdinfo helpers. This will make the review MUCH simpler.

Yeah, will do and send. Thanks.

	Cyrill

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [rfc 3/4] fs, eventfd: Add procfs fdinfo helper
  2012-05-17 16:34   ` Pavel Emelyanov
@ 2012-05-17 17:43     ` Cyrill Gorcunov
  0 siblings, 0 replies; 14+ messages in thread
From: Cyrill Gorcunov @ 2012-05-17 17:43 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: linux-kernel, Andrew Morton, James Bottomley, linux-fsdevel

On Thu, May 17, 2012 at 08:34:59PM +0400, Pavel Emelyanov wrote:
> <snip>
> 
> > +static const struct seq_operations eventfd_fdinfo_ops = {
> > +	.start	= seq_start,
> > +	.next	= seq_next,
> > +	.stop	= seq_stop,
> > +	.show	= seq_show,
> 
> I think, you can use the single_ part of the seq files engine.

single_stop only, since for other cases this seq methods should
provide data iif pos is 1.

	Cyrill

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [rfc 0/4] procfs fdinfo extension
  2012-05-17 16:07 [rfc 0/4] procfs fdinfo extension Cyrill Gorcunov
                   ` (4 preceding siblings ...)
  2012-05-17 16:29 ` [rfc 0/4] procfs fdinfo extension Pavel Emelyanov
@ 2012-05-17 19:05 ` Andrew Morton
  2012-05-17 19:28   ` Cyrill Gorcunov
  2012-05-18 11:53   ` Pavel Emelyanov
  5 siblings, 2 replies; 14+ messages in thread
From: Andrew Morton @ 2012-05-17 19:05 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, Pavel Emelyanov, James Bottomley, linux-fsdevel

On Thu, 17 May 2012 20:07:38 +0400
Cyrill Gorcunov <gorcunov@openvz.org> wrote:

> when we do restore files such as eventfd/eventpoll we need to pass
> appropriate parameters to system calls.

What does "such as" mean?  Provide the whole list, please.  I assume
we're going to have to add ~100 lines of stuff to each and every one? 
Stuff which, according to this patchset, is needed even when
CONFIG_CHECKPOINT_RESTORE=n?

My reason for disliking our whole approach to integration of c/r is
that it exposes us to an ongoing trickle of nasty surprises.  This
patchset is one such nasty surprise, and we don't even know how
extensive this particular surprise will be.

And how many more surprises are we going to get?


I'm quite apprehensive about this, largely because it has so many
unknowns.  How much work would it be to prepare a full list of
everything that still needs to be done to fully implement c/r in Linux?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [rfc 0/4] procfs fdinfo extension
  2012-05-17 19:05 ` Andrew Morton
@ 2012-05-17 19:28   ` Cyrill Gorcunov
  2012-05-18 11:53   ` Pavel Emelyanov
  1 sibling, 0 replies; 14+ messages in thread
From: Cyrill Gorcunov @ 2012-05-17 19:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Pavel Emelyanov, James Bottomley, linux-fsdevel

On Thu, May 17, 2012 at 12:05:41PM -0700, Andrew Morton wrote:
> On Thu, 17 May 2012 20:07:38 +0400
> Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> 
> > when we do restore files such as eventfd/eventpoll we need to pass
> > appropriate parameters to system calls.
> 
> What does "such as" mean? Provide the whole list, please.  I assume
> we're going to have to add ~100 lines of stuff to each and every one? 
> Stuff which, according to this patchset, is needed even when
> CONFIG_CHECKPOINT_RESTORE=n?

By such as I meant eventfd/eventpoll and fsnotify. And Andrew, I don't
know the whole list yet, if I knew it I would definitely share it. But
as I said I'm trying really hard to minimize patch impact on mainline kernel.
With this particular set (well, inotify part needed, but I didn't send
it out yet simple because I find it not well done for mainline inclusion,
but its draft variant lives in our kernel repo on github and I'm using
it alot for testing c/r on various programs) we're able to c/r crond/httpd.

And I didn't wrap this code with CONFIG_CHECKPOINT_RESTORE yet to
simplify the patch and gather people opinion on design in general.
I'll add it later (don't worry I remember about it).

Still plain conversion of fdinfo to seq-files and move code to fd.c
I think is an improvement in readability (the proc/base.c is a _way_
big file and I think it should be splitted by small steps).

> My reason for disliking our whole approach to integration of c/r is
> that it exposes us to an ongoing trickle of nasty surprises.  This
> patchset is one such nasty surprise, and we don't even know how
> extensive this particular surprise will be.

Why it's nasty? We gather back information from kernel needed for
use with syscalls to restore kernel entities (in case of this patchset
we restore eventfd and eventpolls).

> And how many more surprises are we going to get?
> 
> I'm quite apprehensive about this, largely because it has so many
> unknowns.  How much work would it be to prepare a full list of
> everything that still needs to be done to fully implement c/r in Linux?

Well, to build the whole list... hmm, I think I need to talk to Pavel.

	Cyrill

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [rfc 2/4] procfs: Convert /proc/pid/fdinfo/fd handling routines into seq-file with fdinfo helpers
  2012-05-17 17:38     ` Cyrill Gorcunov
@ 2012-05-17 21:49       ` Cyrill Gorcunov
  0 siblings, 0 replies; 14+ messages in thread
From: Cyrill Gorcunov @ 2012-05-17 21:49 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: linux-kernel, Andrew Morton, James Bottomley, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 518 bytes --]

On Thu, May 17, 2012 at 09:38:48PM +0400, Cyrill Gorcunov wrote:
> On Thu, May 17, 2012 at 08:32:04PM +0400, Pavel Emelyanov wrote:
> > 
> > I would split this into two parts -- the 1st one turns fdinfo into seq-files
> > and the 2nd one adds the fdinfo helpers. This will make the review MUCH simpler.
> 
> Yeah, will do and send. Thanks.
> 

Here are two patches, hope such split will help in review procedure.
(I pushed them on github as well, so the complete fd.c can be seen
 here http://goo.gl/mt4q8 ).

	Cyrill

[-- Attachment #2: seq-fdinfo-seq-ops-3 --]
[-- Type: text/plain, Size: 4732 bytes --]

From: Cyrill Gorcunov <gorcunov@openvz.org>
Subject: procfs: Convert /proc/pid/fdinfo/fd handling routines into seq-file

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
 fs/proc/fd.c |  136 ++++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 84 insertions(+), 52 deletions(-)

Index: linux-2.6.git/fs/proc/fd.c
===================================================================
--- linux-2.6.git.orig/fs/proc/fd.c
+++ linux-2.6.git/fs/proc/fd.c
@@ -6,66 +6,90 @@
 #include <linux/namei.h>
 #include <linux/pid.h>
 #include <linux/security.h>
+#include <linux/file.h>
+#include <linux/seq_file.h>
 
 #include <linux/proc_fs.h>
 
 #include "internal.h"
+#include "fd.h"
 
-#define PROC_FDINFO_MAX 64
+struct proc_fdinfo {
+	loff_t	f_pos;
+	int	f_flags;
+};
 
-static int proc_fd_info(struct inode *inode, struct path *path, char *info)
+static int seq_show(struct seq_file *m, void *v)
+{
+	struct proc_fdinfo *fdinfo = m->private;
+	seq_printf(m, "pos:\t%lli\nflags:\t0%o\n",
+		   (long long)fdinfo->f_pos,
+		   fdinfo->f_flags);
+	return 0;
+}
+
+static int seq_fdinfo_open(struct inode *inode, struct file *file)
 {
-	struct task_struct *task = get_proc_task(inode);
 	struct files_struct *files = NULL;
-	int fd = proc_fd(inode);
-	struct file *file;
+	struct proc_fdinfo *fdinfo = NULL;
+	struct task_struct *task;
+	int ret = -ENOENT;
+
+	fdinfo = kzalloc(sizeof(*fdinfo), GFP_KERNEL);
+	if (!fdinfo)
+		return -ENOMEM;
 
+	task = get_proc_task(inode);
 	if (task) {
 		files = get_files_struct(task);
 		put_task_struct(task);
 	}
 
 	if (files) {
-		/*
-		 * We are not taking a ref to the file structure,
-		 * so we must hold ->file_lock.
-		 */
-		spin_lock(&files->file_lock);
-		file = fcheck_files(files, fd);
+		int fd = proc_fd(inode);
+		struct file *fd_file;
 
-		if (file) {
-			unsigned int f_flags;
-			struct fdtable *fdt;
+		spin_lock(&files->file_lock);
+		fd_file = fcheck_files(files, fd);
+		if (fd_file) {
+			struct fdtable *fdt = files_fdtable(files);
 
-			fdt = files_fdtable(files);
-			f_flags = file->f_flags & ~O_CLOEXEC;
+			fdinfo->f_flags = fd_file->f_flags & ~O_CLOEXEC;
 			if (close_on_exec(fd, fdt))
-				f_flags |= O_CLOEXEC;
-
-			if (path) {
-				*path = file->f_path;
-				path_get(&file->f_path);
-			}
-
-			if (info)
-				snprintf(info, PROC_FDINFO_MAX,
-					 "pos:\t%lli\n"
-					 "flags:\t0%o\n",
-					 (long long)file->f_pos,
-					 f_flags);
-
-			spin_unlock(&files->file_lock);
-			put_files_struct(files);
-			return 0;
+				fdinfo->f_flags |= O_CLOEXEC;
+			ret = 0;
 		}
-
 		spin_unlock(&files->file_lock);
 		put_files_struct(files);
 	}
 
-	return -ENOENT;
+	if (!ret) {
+		ret = single_open(file, seq_show, fdinfo);
+		if (!ret)
+			fdinfo = NULL;
+	}
+
+	kfree(fdinfo);
+	return ret;
 }
 
+static int seq_fdinfo_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *m = file->private_data;
+	struct proc_fdinfo *fdinfo = m->private;
+
+	kfree(fdinfo);
+
+	return single_release(inode, file);
+}
+
+static const struct file_operations proc_fdinfo_file_operations = {
+	.open		= seq_fdinfo_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_fdinfo_release,
+};
+
 static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd)
 {
 	struct files_struct *files;
@@ -120,7 +144,31 @@ static const struct dentry_operations ti
 
 static int proc_fd_link(struct dentry *dentry, struct path *path)
 {
-	return proc_fd_info(dentry->d_inode, path, NULL);
+	struct inode *inode = dentry->d_inode;
+	struct task_struct *task = get_proc_task(inode);
+	struct files_struct *files = NULL;
+	int fd = proc_fd(inode);
+	struct file *file;
+	int err = -ENOENT;
+
+	if (task) {
+		files = get_files_struct(task);
+		put_task_struct(task);
+	}
+
+	if (files) {
+		spin_lock(&files->file_lock);
+		file = fcheck_files(files, fd);
+		if (file) {
+			*path = file->f_path;
+			path_get(&file->f_path);
+		}
+		spin_unlock(&files->file_lock);
+		put_files_struct(files);
+		err = 0;
+	}
+
+	return err;
 }
 
 static struct dentry *
@@ -263,22 +311,6 @@ out_no_task:
 	return retval;
 }
 
-static ssize_t proc_fdinfo_read(struct file *file, char __user *buf,
-				      size_t len, loff_t *ppos)
-{
-	char tmp[PROC_FDINFO_MAX];
-	int err = proc_fd_info(file->f_path.dentry->d_inode, NULL, tmp);
-	if (!err)
-		err = simple_read_from_buffer(buf, len, ppos, tmp, strlen(tmp));
-	return err;
-}
-
-static const struct file_operations proc_fdinfo_file_operations = {
-	.open           = nonseekable_open,
-	.read		= proc_fdinfo_read,
-	.llseek		= no_llseek,
-};
-
 static int proc_readfd(struct file *filp, void *dirent, filldir_t filldir)
 {
 	return proc_readfd_common(filp, dirent, filldir, proc_fd_instantiate);

[-- Attachment #3: seq-fdinfo-seq-ops-helpers --]
[-- Type: text/plain, Size: 7266 bytes --]

From: Cyrill Gorcunov <gorcunov@openvz.org>
Subject: procfs: Add ability to plugin auxiliary fdinfo providers

This patch brings ability to plugin auxiliary fdinfo providers.
For example in further patches eventfd, evenpoll and fsnotify
will print out additional information associated with files
being inspected after traditional pos,flags pair.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
 fs/proc/fd.c            |  186 +++++++++++++++++++++++++++++++++++++-----------
 include/linux/proc_fs.h |   23 +++++
 2 files changed, 168 insertions(+), 41 deletions(-)

Index: linux-2.6.git/fs/proc/fd.c
===================================================================
--- linux-2.6.git.orig/fs/proc/fd.c
+++ linux-2.6.git/fs/proc/fd.c
@@ -8,79 +8,183 @@
 #include <linux/security.h>
 #include <linux/file.h>
 #include <linux/seq_file.h>
+#include <linux/spinlock.h>
 
 #include <linux/proc_fs.h>
 
 #include "internal.h"
 #include "fd.h"
 
-struct proc_fdinfo {
-	loff_t	f_pos;
-	int	f_flags;
-};
+static LIST_HEAD(fdinfo_helpers);
+static DEFINE_RWLOCK(fdinfo_helpers_lock);
+
+int proc_register_fdinfo_helper(struct proc_fdinfo_helper *helper)
+{
+	struct proc_fdinfo_helper *item;
+	int ret = 0;
+
+	if (!helper->ops || !helper->probe)
+		return -EINVAL;
+
+	write_lock(&fdinfo_helpers_lock);
+	list_for_each_entry(item, &fdinfo_helpers, list) {
+		if (item == helper) {
+			pr_err("procfs fdinfo helper `%s' is already registered\n",
+			       item->name);
+			ret = -EINVAL;
+			break;
+		}
+	}
+	if (!ret)
+		list_add(&helper->list, &fdinfo_helpers);
+	write_unlock(&fdinfo_helpers_lock);
+	return ret;
+}
+
+void proc_unregister_fdinfo_helper(struct proc_fdinfo_helper *helper)
+{
+	struct proc_fdinfo_helper *item;
+
+	write_lock(&fdinfo_helpers_lock);
+	list_for_each_entry(item, &fdinfo_helpers, list) {
+		if (item == helper) {
+			list_del(&item->list);
+			break;
+		}
+	}
+	write_unlock(&fdinfo_helpers_lock);
+}
+
+static void assign_fdinfo_helper(struct proc_fdinfo_extra *extra)
+{
+	struct proc_fdinfo_helper *item;
+	read_lock(&fdinfo_helpers_lock);
+	list_for_each_entry(item, &fdinfo_helpers, list) {
+		if (item->probe(extra->fd_file)) {
+			extra->helper = item;
+			break;
+		}
+	}
+	read_unlock(&fdinfo_helpers_lock);
+}
+
+static void *seq_start(struct seq_file *m, loff_t *pos)
+{
+	struct proc_fdinfo_extra *extra = m->private;
+
+	read_lock(&fdinfo_helpers_lock);
+
+	if (*pos > 0)
+		extra->hdr_shown = true;
+
+	return *pos == 0 ? extra :
+		(extra->helper ? extra->helper->ops->start(m, pos) : NULL);
+}
+
+static void seq_stop(struct seq_file *m, void *v)
+{
+	struct proc_fdinfo_extra *extra = m->private;
+
+	if (extra->helper && extra->hdr_shown)
+		extra->helper->ops->stop(m, v);
+
+	read_unlock(&fdinfo_helpers_lock);
+}
+
+static void *seq_next(struct seq_file *m, void *p, loff_t *pos)
+{
+	struct proc_fdinfo_extra *extra = m->private;
+
+	if (extra->helper)
+		return extra->helper->ops->next(m, p, pos);
+
+	++*pos;
+	return NULL;
+}
 
 static int seq_show(struct seq_file *m, void *v)
 {
-	struct proc_fdinfo *fdinfo = m->private;
+	struct proc_fdinfo_extra *extra = m->private;
+
+	if (extra->helper && extra->hdr_shown)
+		return extra->helper->ops->show(m, v);
+
 	seq_printf(m, "pos:\t%lli\nflags:\t0%o\n",
-		   (long long)fdinfo->f_pos,
-		   fdinfo->f_flags);
+		   (long long)extra->fd_file->f_pos,
+		   extra->f_flags);
+
+	extra->hdr_shown = 1;
 	return 0;
 }
 
+static const struct seq_operations fdinfo_seq_ops = {
+	.start	= seq_start,
+	.next	= seq_next,
+	.stop	= seq_stop,
+	.show	= seq_show,
+};
+
 static int seq_fdinfo_open(struct inode *inode, struct file *file)
 {
 	struct files_struct *files = NULL;
-	struct proc_fdinfo *fdinfo = NULL;
+	struct proc_fdinfo_extra *extra;
 	struct task_struct *task;
-	int ret = -ENOENT;
+	struct seq_file *m;
+	int ret;
 
-	fdinfo = kzalloc(sizeof(*fdinfo), GFP_KERNEL);
-	if (!fdinfo)
+	extra = kzalloc(sizeof(*extra), GFP_KERNEL);
+	if (!extra)
 		return -ENOMEM;
+	extra->inode = inode;
 
-	task = get_proc_task(inode);
-	if (task) {
-		files = get_files_struct(task);
-		put_task_struct(task);
-	}
-
-	if (files) {
-		int fd = proc_fd(inode);
-		struct file *fd_file;
-
-		spin_lock(&files->file_lock);
-		fd_file = fcheck_files(files, fd);
-		if (fd_file) {
-			struct fdtable *fdt = files_fdtable(files);
-
-			fdinfo->f_flags = fd_file->f_flags & ~O_CLOEXEC;
-			if (close_on_exec(fd, fdt))
-				fdinfo->f_flags |= O_CLOEXEC;
-			ret = 0;
+	ret = seq_open(file, &fdinfo_seq_ops);
+	if (!ret) {
+		ret = -ENOENT;
+		m = file->private_data;
+		m->private = extra;
+
+		task = get_proc_task(inode);
+		if (task) {
+			files = get_files_struct(task);
+			put_task_struct(task);
 		}
-		spin_unlock(&files->file_lock);
-		put_files_struct(files);
-	}
 
-	if (!ret) {
-		ret = single_open(file, seq_show, fdinfo);
-		if (!ret)
-			fdinfo = NULL;
+		if (files) {
+			int fd = proc_fd(inode);
+
+			spin_lock(&files->file_lock);
+			extra->fd_file = fcheck_files(files, fd);
+			if (extra->fd_file) {
+				struct fdtable *fdt = files_fdtable(files);
+
+				extra->f_flags = extra->fd_file->f_flags & ~O_CLOEXEC;
+				if (close_on_exec(fd, fdt))
+					extra->f_flags |= O_CLOEXEC;
+				get_file(extra->fd_file);
+				ret = 0;
+			}
+			spin_unlock(&files->file_lock);
+			put_files_struct(files);
+
+			if (extra->fd_file)
+				assign_fdinfo_helper(extra);
+		}
 	}
 
-	kfree(fdinfo);
+	if (ret)
+		kfree(extra);
 	return ret;
 }
 
 static int seq_fdinfo_release(struct inode *inode, struct file *file)
 {
 	struct seq_file *m = file->private_data;
-	struct proc_fdinfo *fdinfo = m->private;
+	struct proc_fdinfo_extra *extra = m->private;
 
-	kfree(fdinfo);
+	put_filp(extra->fd_file);
+	kfree(m->private);
 
-	return single_release(inode, file);
+	return seq_release(inode, file);
 }
 
 static const struct file_operations proc_fdinfo_file_operations = {
Index: linux-2.6.git/include/linux/proc_fs.h
===================================================================
--- linux-2.6.git.orig/include/linux/proc_fs.h
+++ linux-2.6.git/include/linux/proc_fs.h
@@ -100,6 +100,23 @@ struct vmcore {
 	loff_t offset;
 };
 
+struct seq_operations;
+struct proc_fdinfo_helper {
+	struct list_head		list;
+	const char			*name;
+	const struct seq_operations	*ops;
+	int				(*probe)(struct file *file);
+};
+
+struct proc_fdinfo_extra {
+	struct proc_fdinfo_helper	*helper;
+	struct inode			*inode;
+	struct file			*fd_file;
+	unsigned int			f_flags;
+	void				*private;
+	int				hdr_shown;
+};
+
 #ifdef CONFIG_PROC_FS
 
 extern void proc_root_init(void);
@@ -175,6 +192,9 @@ extern struct proc_dir_entry *proc_net_m
 
 extern struct file *proc_ns_fget(int fd);
 
+extern int proc_register_fdinfo_helper(struct proc_fdinfo_helper *helper);
+extern void proc_unregister_fdinfo_helper(struct proc_fdinfo_helper *helper);
+
 #else
 
 #define proc_net_fops_create(net, name, mode, fops)  ({ (void)(mode), NULL; })
@@ -229,6 +249,9 @@ static inline struct file *proc_ns_fget(
 	return ERR_PTR(-EINVAL);
 }
 
+static inline int proc_register_fdinfo_helper(struct proc_fdinfo_helper *helper) { return -EINVAL; }
+static inline void proc_unregister_fdinfo_helper(struct proc_fdinfo_helper *helper) { }
+
 #endif /* CONFIG_PROC_FS */
 
 #if !defined(CONFIG_PROC_KCORE)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [rfc 0/4] procfs fdinfo extension
  2012-05-17 19:05 ` Andrew Morton
  2012-05-17 19:28   ` Cyrill Gorcunov
@ 2012-05-18 11:53   ` Pavel Emelyanov
  1 sibling, 0 replies; 14+ messages in thread
From: Pavel Emelyanov @ 2012-05-18 11:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Cyrill Gorcunov, linux-kernel, James Bottomley, linux-fsdevel

On 05/17/2012 11:05 PM, Andrew Morton wrote:
> On Thu, 17 May 2012 20:07:38 +0400
> Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> 
>> when we do restore files such as eventfd/eventpoll we need to pass
>> appropriate parameters to system calls.
> 
> What does "such as" mean?  Provide the whole list, please.

As far as this particular set is concerned -- we'll add inotify stuff
there and the signalfd info a bit later (no apps we test use it by now).
That's it. But once there will appear more fd-s based on anon_inode
and that are configured using custom syscalls, we'll have to implement
the "get info" API for them too.

> I assume we're going to have to add ~100 lines of stuff to each and 
> every one? Stuff which, according to this patchset, is needed even when
> CONFIG_CHECKPOINT_RESTORE=n?

Generally speaking -- yes. It can be useful to check e.g. what the event
counter is on some eventfd for debugging purposes. Right now there's no
way for getting this info.

> My reason for disliking our whole approach to integration of c/r is
> that it exposes us to an ongoing trickle of nasty surprises.  This
> patchset is one such nasty surprise, and we don't even know how
> extensive this particular surprise will be.

Well, I wouldn't say it's nasty. The problem with evetpoll, eventfd and
inotifies, is that each of them is a set-only API unlike most of the other
APIs for working with fds. I.e. -- you can configure your fd, it will work,
yes. But you have NO MEANS for finding out what you have configured 
previously. What we do is just make the API complete.

If you believe that using fdinfo file in proc for that is nasty, then we're
OK to rework this. But this approach was discussed on LSF this year, that's
why we've implemented it this way.

> And how many more surprises are we going to get?
> 
> 
> I'm quite apprehensive about this, largely because it has so many
> unknowns.  How much work would it be to prepare a full list of
> everything that still needs to be done to fully implement c/r in Linux?

I'd say, that the whole c/r project is being done to determine how much
information the kernel does NOT provide yet. And we really appreciate that
this happens on the mainstream kernel, rather than in a separate branch.

> .
> 

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-05-18 11:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-17 16:07 [rfc 0/4] procfs fdinfo extension Cyrill Gorcunov
2012-05-17 16:07 ` [rfc 1/4] procfs: Move /proc/pid/fd[info] handling code to fd.[ch] Cyrill Gorcunov
2012-05-17 16:07 ` [rfc 2/4] procfs: Convert /proc/pid/fdinfo/fd handling routines into seq-file with fdinfo helpers Cyrill Gorcunov
2012-05-17 16:32   ` Pavel Emelyanov
2012-05-17 17:38     ` Cyrill Gorcunov
2012-05-17 21:49       ` Cyrill Gorcunov
2012-05-17 16:07 ` [rfc 3/4] fs, eventfd: Add procfs fdinfo helper Cyrill Gorcunov
2012-05-17 16:34   ` Pavel Emelyanov
2012-05-17 17:43     ` Cyrill Gorcunov
2012-05-17 16:07 ` [rfc 4/4] fs, epoll: " Cyrill Gorcunov
2012-05-17 16:29 ` [rfc 0/4] procfs fdinfo extension Pavel Emelyanov
2012-05-17 19:05 ` Andrew Morton
2012-05-17 19:28   ` Cyrill Gorcunov
2012-05-18 11:53   ` Pavel Emelyanov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).