linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 00/30] Read-only bind mounts (-mm resend)
@ 2008-02-08 22:26 Dave Hansen
  2008-02-08 22:26 ` [RFC][PATCH 01/30] reiserfs: eliminate private use of struct file in xattr Dave Hansen
                   ` (32 more replies)
  0 siblings, 33 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen

This is against current Linus git
(a4ffc0a0b240a29cbe489f6db9dae112a49ef1c1).

This rolls up all the -mm bugfixes that were accumulated, and
addresses some new review comments from Al.  Also contains some
reworking from hch and a patch from Jeff Dike.

Just posting here to let everyone have a sniff before we resend
it back to -mm.

---

Why do we need r/o bind mounts?

This feature allows a read-only view into a read-write filesystem.
In the process of doing that, it also provides infrastructure for
keeping track of the number of writers to any given mount.

This has a number of uses.  It allows chroots to have parts of
filesystems writable.  It will be useful for containers in the future
because users may have root inside a container, but should not
be allowed to write to somefilesystems.  This also replaces 
patches that vserver has had out of the tree for several years.

It allows security enhancement by making sure that parts of
your filesystem are read-only (such as when you don't trust your
FTP server), when you don't want to have entire new filesystems
mounted, or when you want atime selectively updated.
I've been using this script:

	http://sr71.net/~dave/linux/robind-test.sh

to test that the feature is working as desired.  It takes a
directory and makes a regular bind and a r/o bind mount of it.
It then performs some normal filesystem operations on the
three directories, including ones that are expected to fail,
like creating a file on the r/o mount.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 01/30] reiserfs: eliminate private use of struct file in xattr
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
@ 2008-02-08 22:26 ` Dave Hansen
  2008-02-08 22:26 ` [RFC][PATCH 02/30] hppfs pass vfsmount to dentry_open() Dave Hansen
                   ` (31 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


After several posts and bug reports regarding interaction with the NULL
nameidata, here's a patch to clean up the mess with struct file in the
reiserfs xattr code.

As observed in several of the posts, there's really no need for struct file
to exist in the xattr code.  It was really only passed around due to the
f_op->readdir() and a_ops->{prepare,commit}_write prototypes requiring it.

reiserfs_prepare_write() and reiserfs_commit_write() don't actually use the
struct file passed to it, and the xattr code uses a private version of
reiserfs_readdir() to enumerate the xattr directories.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---

 linux-2.6.git-dave/fs/reiserfs/xattr.c |  110 +++++++++------------------------
 1 file changed, 30 insertions(+), 80 deletions(-)

diff -puN fs/reiserfs/xattr.c~reiserfs-eliminate-private-use-of-struct-file-in-xattr fs/reiserfs/xattr.c
--- linux-2.6.git/fs/reiserfs/xattr.c~reiserfs-eliminate-private-use-of-struct-file-in-xattr	2008-02-08 13:04:43.000000000 -0800
+++ linux-2.6.git-dave/fs/reiserfs/xattr.c	2008-02-08 13:04:43.000000000 -0800
@@ -191,28 +191,11 @@ static struct dentry *get_xa_file_dentry
 	dput(xadir);
 	if (err)
 		xafile = ERR_PTR(err);
-	return xafile;
-}
-
-/* Opens a file pointer to the attribute associated with inode */
-static struct file *open_xa_file(const struct inode *inode, const char *name,
-				 int flags)
-{
-	struct dentry *xafile;
-	struct file *fp;
-
-	xafile = get_xa_file_dentry(inode, name, flags);
-	if (IS_ERR(xafile))
-		return ERR_PTR(PTR_ERR(xafile));
 	else if (!xafile->d_inode) {
 		dput(xafile);
-		return ERR_PTR(-ENODATA);
+		xafile = ERR_PTR(-ENODATA);
 	}
-
-	fp = dentry_open(xafile, NULL, O_RDWR);
-	/* dentry_open dputs the dentry if it fails */
-
-	return fp;
+	return xafile;
 }
 
 /*
@@ -228,9 +211,8 @@ static struct file *open_xa_file(const s
  * we're called with i_mutex held, so there are no worries about the directory
  * changing underneath us.
  */
-static int __xattr_readdir(struct file *filp, void *dirent, filldir_t filldir)
+static int __xattr_readdir(struct inode *inode, void *dirent, filldir_t filldir)
 {
-	struct inode *inode = filp->f_path.dentry->d_inode;
 	struct cpu_key pos_key;	/* key of current position in the directory (key of directory entry) */
 	INITIALIZE_PATH(path_to_entry);
 	struct buffer_head *bh;
@@ -374,23 +356,16 @@ static int __xattr_readdir(struct file *
  *
  */
 static
-int xattr_readdir(struct file *file, filldir_t filler, void *buf)
+int xattr_readdir(struct inode *inode, filldir_t filler, void *buf)
 {
-	struct inode *inode = file->f_path.dentry->d_inode;
-	int res = -ENOTDIR;
-	if (!file->f_op || !file->f_op->readdir)
-		goto out;
+	int res = -ENOENT;
 	mutex_lock_nested(&inode->i_mutex, I_MUTEX_XATTR);
-//        down(&inode->i_zombie);
-	res = -ENOENT;
 	if (!IS_DEADDIR(inode)) {
 		lock_kernel();
-		res = __xattr_readdir(file, buf, filler);
+		res = __xattr_readdir(inode, buf, filler);
 		unlock_kernel();
 	}
-//        up(&inode->i_zombie);
 	mutex_unlock(&inode->i_mutex);
-      out:
 	return res;
 }
 
@@ -442,7 +417,7 @@ reiserfs_xattr_set(struct inode *inode, 
 		   size_t buffer_size, int flags)
 {
 	int err = 0;
-	struct file *fp;
+	struct dentry *dentry;
 	struct page *page;
 	char *data;
 	struct address_space *mapping;
@@ -460,18 +435,18 @@ reiserfs_xattr_set(struct inode *inode, 
 		xahash = xattr_hash(buffer, buffer_size);
 
       open_file:
-	fp = open_xa_file(inode, name, flags);
-	if (IS_ERR(fp)) {
-		err = PTR_ERR(fp);
+	dentry = get_xa_file_dentry(inode, name, flags);
+	if (IS_ERR(dentry)) {
+		err = PTR_ERR(dentry);
 		goto out;
 	}
 
-	xinode = fp->f_path.dentry->d_inode;
+	xinode = dentry->d_inode;
 	REISERFS_I(inode)->i_flags |= i_has_xattr_dir;
 
 	/* we need to copy it off.. */
 	if (xinode->i_nlink > 1) {
-		fput(fp);
+		dput(dentry);
 		err = reiserfs_xattr_del(inode, name);
 		if (err < 0)
 			goto out;
@@ -485,7 +460,7 @@ reiserfs_xattr_set(struct inode *inode, 
 	newattrs.ia_size = buffer_size;
 	newattrs.ia_valid = ATTR_SIZE | ATTR_CTIME;
 	mutex_lock_nested(&xinode->i_mutex, I_MUTEX_XATTR);
-	err = notify_change(fp->f_path.dentry, &newattrs);
+	err = notify_change(dentry, &newattrs);
 	if (err)
 		goto out_filp;
 
@@ -518,15 +493,14 @@ reiserfs_xattr_set(struct inode *inode, 
 			rxh->h_hash = cpu_to_le32(xahash);
 		}
 
-		err = reiserfs_prepare_write(fp, page, page_offset,
+		err = reiserfs_prepare_write(NULL, page, page_offset,
 					    page_offset + chunk + skip);
 		if (!err) {
 			if (buffer)
 				memcpy(data + skip, buffer + buffer_pos, chunk);
-			err =
-			    reiserfs_commit_write(fp, page, page_offset,
-						  page_offset + chunk +
-						  skip);
+			err = reiserfs_commit_write(NULL, page, page_offset,
+						    page_offset + chunk +
+						    skip);
 		}
 		unlock_page(page);
 		reiserfs_put_page(page);
@@ -548,7 +522,7 @@ reiserfs_xattr_set(struct inode *inode, 
 
       out_filp:
 	mutex_unlock(&xinode->i_mutex);
-	fput(fp);
+	dput(dentry);
 
       out:
 	return err;
@@ -562,7 +536,7 @@ reiserfs_xattr_get(const struct inode *i
 		   size_t buffer_size)
 {
 	ssize_t err = 0;
-	struct file *fp;
+	struct dentry *dentry;
 	size_t isize;
 	size_t file_pos = 0;
 	size_t buffer_pos = 0;
@@ -578,13 +552,13 @@ reiserfs_xattr_get(const struct inode *i
 	if (get_inode_sd_version(inode) == STAT_DATA_V1)
 		return -EOPNOTSUPP;
 
-	fp = open_xa_file(inode, name, FL_READONLY);
-	if (IS_ERR(fp)) {
-		err = PTR_ERR(fp);
+	dentry = get_xa_file_dentry(inode, name, FL_READONLY);
+	if (IS_ERR(dentry)) {
+		err = PTR_ERR(dentry);
 		goto out;
 	}
 
-	xinode = fp->f_path.dentry->d_inode;
+	xinode = dentry->d_inode;
 	isize = xinode->i_size;
 	REISERFS_I(inode)->i_flags |= i_has_xattr_dir;
 
@@ -652,7 +626,7 @@ reiserfs_xattr_get(const struct inode *i
 	}
 
       out_dput:
-	fput(fp);
+	dput(dentry);
 
       out:
 	return err;
@@ -742,7 +716,6 @@ reiserfs_delete_xattrs_filler(void *buf,
 /* This is called w/ inode->i_mutex downed */
 int reiserfs_delete_xattrs(struct inode *inode)
 {
-	struct file *fp;
 	struct dentry *dir, *root;
 	int err = 0;
 
@@ -763,15 +736,8 @@ int reiserfs_delete_xattrs(struct inode 
 		return 0;
 	}
 
-	fp = dentry_open(dir, NULL, O_RDWR);
-	if (IS_ERR(fp)) {
-		err = PTR_ERR(fp);
-		/* dentry_open dputs the dentry if it fails */
-		goto out;
-	}
-
 	lock_kernel();
-	err = xattr_readdir(fp, reiserfs_delete_xattrs_filler, dir);
+	err = xattr_readdir(dir->d_inode, reiserfs_delete_xattrs_filler, dir);
 	if (err) {
 		unlock_kernel();
 		goto out_dir;
@@ -791,7 +757,7 @@ int reiserfs_delete_xattrs(struct inode 
 	unlock_kernel();
 
       out_dir:
-	fput(fp);
+	dput(dir);
 
       out:
 	if (!err)
@@ -833,7 +799,6 @@ reiserfs_chown_xattrs_filler(void *buf, 
 
 int reiserfs_chown_xattrs(struct inode *inode, struct iattr *attrs)
 {
-	struct file *fp;
 	struct dentry *dir;
 	int err = 0;
 	struct reiserfs_chown_buf buf;
@@ -857,13 +822,6 @@ int reiserfs_chown_xattrs(struct inode *
 		goto out;
 	}
 
-	fp = dentry_open(dir, NULL, O_RDWR);
-	if (IS_ERR(fp)) {
-		err = PTR_ERR(fp);
-		/* dentry_open dputs the dentry if it fails */
-		goto out;
-	}
-
 	lock_kernel();
 
 	attrs->ia_valid &= (ATTR_UID | ATTR_GID | ATTR_CTIME);
@@ -871,7 +829,7 @@ int reiserfs_chown_xattrs(struct inode *
 	buf.attrs = attrs;
 	buf.inode = inode;
 
-	err = xattr_readdir(fp, reiserfs_chown_xattrs_filler, &buf);
+	err = xattr_readdir(dir->d_inode, reiserfs_chown_xattrs_filler, &buf);
 	if (err) {
 		unlock_kernel();
 		goto out_dir;
@@ -881,7 +839,7 @@ int reiserfs_chown_xattrs(struct inode *
 	unlock_kernel();
 
       out_dir:
-	fput(fp);
+	dput(dir);
 
       out:
 	attrs->ia_valid = ia_valid;
@@ -1029,7 +987,6 @@ reiserfs_listxattr_filler(void *buf, con
  */
 ssize_t reiserfs_listxattr(struct dentry * dentry, char *buffer, size_t size)
 {
-	struct file *fp;
 	struct dentry *dir;
 	int err = 0;
 	struct reiserfs_listxattr_buf buf;
@@ -1052,13 +1009,6 @@ ssize_t reiserfs_listxattr(struct dentry
 		goto out;
 	}
 
-	fp = dentry_open(dir, NULL, O_RDWR);
-	if (IS_ERR(fp)) {
-		err = PTR_ERR(fp);
-		/* dentry_open dputs the dentry if it fails */
-		goto out;
-	}
-
 	buf.r_buf = buffer;
 	buf.r_size = buffer ? size : 0;
 	buf.r_pos = 0;
@@ -1066,7 +1016,7 @@ ssize_t reiserfs_listxattr(struct dentry
 
 	REISERFS_I(dentry->d_inode)->i_flags |= i_has_xattr_dir;
 
-	err = xattr_readdir(fp, reiserfs_listxattr_filler, &buf);
+	err = xattr_readdir(dir->d_inode, reiserfs_listxattr_filler, &buf);
 	if (err)
 		goto out_dir;
 
@@ -1076,7 +1026,7 @@ ssize_t reiserfs_listxattr(struct dentry
 		err = buf.r_pos;
 
       out_dir:
-	fput(fp);
+	dput(dir);
 
       out:
 	reiserfs_read_unlock_xattr_i(dentry->d_inode);
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 02/30] hppfs pass vfsmount to dentry_open()
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
  2008-02-08 22:26 ` [RFC][PATCH 01/30] reiserfs: eliminate private use of struct file in xattr Dave Hansen
@ 2008-02-08 22:26 ` Dave Hansen
  2008-02-08 22:26 ` [RFC][PATCH 03/30] check for null vfsmount in dentry_open() Dave Hansen
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


Here's patch for hppfs that uses vfs_kern_mount to make sure it always
has a procfs instance and passed the vfsmount on through the inode
private data.  Also fixes a procfs file_system_type leak for every attempted   
hppfs mount.

[ jdike - gave this file a style workover, plus deleted hppfs_dentry_ops ]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---

 linux-2.6.git-dave/fs/hppfs/hppfs_kern.c |  364 ++++++++++++++++---------------
 1 file changed, 188 insertions(+), 176 deletions(-)

diff -puN fs/hppfs/hppfs_kern.c~hppfs-pass-vfsmount-to-dentry_open fs/hppfs/hppfs_kern.c
--- linux-2.6.git/fs/hppfs/hppfs_kern.c~hppfs-pass-vfsmount-to-dentry_open	2008-02-08 13:04:44.000000000 -0800
+++ linux-2.6.git-dave/fs/hppfs/hppfs_kern.c	2008-02-08 13:04:44.000000000 -0800
@@ -1,23 +1,25 @@
 /*
- * Copyright (C) 2002 Jeff Dike (jdike@karaya.com)
+ * Copyright (C) 2002 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
  * Licensed under the GPL
  */
 
-#include <linux/fs.h>
+#include <linux/ctype.h>
+#include <linux/dcache.h>
 #include <linux/file.h>
-#include <linux/module.h>
+#include <linux/fs.h>
 #include <linux/init.h>
-#include <linux/slab.h>
-#include <linux/list.h>
 #include <linux/kernel.h>
-#include <linux/ctype.h>
-#include <linux/dcache.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mount.h>
+#include <linux/slab.h>
 #include <linux/statfs.h>
+#include <linux/types.h>
 #include <asm/uaccess.h>
-#include <asm/fcntl.h>
 #include "os.h"
 
-static int init_inode(struct inode *inode, struct dentry *dentry);
+static int init_inode(struct inode *inode, struct dentry *dentry,
+		      struct vfsmount *mnt);
 
 struct hppfs_data {
 	struct list_head list;
@@ -33,6 +35,7 @@ struct hppfs_private {
 
 struct hppfs_inode_info {
         struct dentry *proc_dentry;
+	struct vfsmount *proc_mnt;
 	struct inode vfs_inode;
 };
 
@@ -51,14 +54,14 @@ static int is_pid(struct dentry *dentry)
 	int i;
 
 	sb = dentry->d_sb;
-	if((sb->s_op != &hppfs_sbops) || (dentry->d_parent != sb->s_root))
-		return(0);
+	if ((sb->s_op != &hppfs_sbops) || (dentry->d_parent != sb->s_root))
+		return 0;
 
-	for(i = 0; i < dentry->d_name.len; i++){
-		if(!isdigit(dentry->d_name.name[i]))
-			return(0);
+	for (i = 0; i < dentry->d_name.len; i++) {
+		if (!isdigit(dentry->d_name.name[i]))
+			return 0;
 	}
-	return(1);
+	return 1;
 }
 
 static char *dentry_name(struct dentry *dentry, int extra)
@@ -70,8 +73,8 @@ static char *dentry_name(struct dentry *
 
 	len = 0;
 	parent = dentry;
-	while(parent->d_parent != parent){
-		if(is_pid(parent))
+	while (parent->d_parent != parent) {
+		if (is_pid(parent))
 			len += strlen("pid") + 1;
 		else len += parent->d_name.len + 1;
 		parent = parent->d_parent;
@@ -80,12 +83,13 @@ static char *dentry_name(struct dentry *
 	root = "proc";
 	len += strlen(root);
 	name = kmalloc(len + extra + 1, GFP_KERNEL);
-	if(name == NULL) return(NULL);
+	if (name == NULL)
+		return NULL;
 
 	name[len] = '\0';
 	parent = dentry;
-	while(parent->d_parent != parent){
-		if(is_pid(parent)){
+	while (parent->d_parent != parent) {
+		if (is_pid(parent)) {
 			seg_name = "pid";
 			seg_len = strlen("pid");
 		}
@@ -100,27 +104,25 @@ static char *dentry_name(struct dentry *
 		parent = parent->d_parent;
 	}
 	strncpy(name, root, strlen(root));
-	return(name);
+	return name;
 }
 
-struct dentry_operations hppfs_dentry_ops = {
-};
-
 static int file_removed(struct dentry *dentry, const char *file)
 {
 	char *host_file;
 	int extra, fd;
 
 	extra = 0;
-	if(file != NULL) extra += strlen(file) + 1;
+	if (file != NULL)
+		extra += strlen(file) + 1;
 
 	host_file = dentry_name(dentry, extra + strlen("/remove"));
-	if(host_file == NULL){
-		printk("file_removed : allocation failed\n");
-		return(-ENOMEM);
+	if (host_file == NULL) {
+		printk(KERN_ERR "file_removed : allocation failed\n");
+		return -ENOMEM;
 	}
 
-	if(file != NULL){
+	if (file != NULL) {
 		strcat(host_file, "/");
 		strcat(host_file, file);
 	}
@@ -128,18 +130,18 @@ static int file_removed(struct dentry *d
 
 	fd = os_open_file(host_file, of_read(OPENFLAGS()), 0);
 	kfree(host_file);
-	if(fd > 0){
+	if (fd > 0) {
 		os_close_file(fd);
-		return(1);
+		return 1;
 	}
-	return(0);
+	return 0;
 }
 
 static void hppfs_read_inode(struct inode *ino)
 {
 	struct inode *proc_ino;
 
-	if(HPPFS_I(ino)->proc_dentry == NULL)
+	if (HPPFS_I(ino)->proc_dentry == NULL)
 		return;
 
 	proc_ino = HPPFS_I(ino)->proc_dentry->d_inode;
@@ -177,32 +179,32 @@ static struct dentry *hppfs_lookup(struc
 	int err, deleted;
 
 	deleted = file_removed(dentry, NULL);
-	if(deleted < 0)
-		return(ERR_PTR(deleted));
-	else if(deleted)
-		return(ERR_PTR(-ENOENT));
+	if (deleted < 0)
+		return ERR_PTR(deleted);
+	else if (deleted)
+		return ERR_PTR(-ENOENT);
 
 	err = -ENOMEM;
 	parent = HPPFS_I(ino)->proc_dentry;
 	mutex_lock(&parent->d_inode->i_mutex);
 	proc_dentry = d_lookup(parent, &dentry->d_name);
-	if(proc_dentry == NULL){
+	if (proc_dentry == NULL) {
 		proc_dentry = d_alloc(parent, &dentry->d_name);
-		if(proc_dentry == NULL){
+		if (proc_dentry == NULL) {
 			mutex_unlock(&parent->d_inode->i_mutex);
 			goto out;
 		}
 		new = (*parent->d_inode->i_op->lookup)(parent->d_inode,
 						       proc_dentry, NULL);
-		if(new){
+		if (new) {
 			dput(proc_dentry);
 			proc_dentry = new;
 		}
 	}
 	mutex_unlock(&parent->d_inode->i_mutex);
 
-	if(IS_ERR(proc_dentry))
-		return(proc_dentry);
+	if (IS_ERR(proc_dentry))
+		return proc_dentry;
 
 	inode = hppfs_iget(ino->i_sb);
 	if (IS_ERR(inode)) {
@@ -210,22 +212,21 @@ static struct dentry *hppfs_lookup(struc
 		goto out_dput;
 	}
 
-	err = init_inode(inode, proc_dentry);
-	if(err)
+	err = init_inode(inode, proc_dentry, HPPFS_I(ino)->proc_mnt);
+	if (err)
 		goto out_put;
 
 	hppfs_read_inode(inode);
 
  	d_add(dentry, inode);
-	dentry->d_op = &hppfs_dentry_ops;
-	return(NULL);
+	return NULL;
 
  out_put:
 	iput(inode);
  out_dput:
 	dput(proc_dentry);
  out:
-	return(ERR_PTR(err));
+	return ERR_PTR(err);
 }
 
 static const struct inode_operations hppfs_file_iops = {
@@ -239,15 +240,16 @@ static ssize_t read_proc(struct file *fi
 
 	read = file->f_path.dentry->d_inode->i_fop->read;
 
-	if(!is_user)
+	if (!is_user)
 		set_fs(KERNEL_DS);
 
 	n = (*read)(file, buf, count, &file->f_pos);
 
-	if(!is_user)
+	if (!is_user)
 		set_fs(USER_DS);
 
-	if(ppos) *ppos = file->f_pos;
+	if (ppos)
+		*ppos = file->f_pos;
 	return n;
 }
 
@@ -259,24 +261,23 @@ static ssize_t hppfs_read_file(int fd, c
 
 	n = -ENOMEM;
 	new_buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
-	if(new_buf == NULL){
-		printk("hppfs_read_file : kmalloc failed\n");
+	if (new_buf == NULL) {
+		printk(KERN_ERR "hppfs_read_file : kmalloc failed\n");
 		goto out;
 	}
 	n = 0;
-	while(count > 0){
+	while (count > 0) {
 		cur = min_t(ssize_t, count, PAGE_SIZE);
 		err = os_read_file(fd, new_buf, cur);
-		if(err < 0){
-			printk("hppfs_read : read failed, errno = %d\n",
-			       err);
+		if (err < 0) {
+			printk(KERN_ERR "hppfs_read : read failed, "
+			       "errno = %d\n", err);
 			n = err;
 			goto out_free;
-		}
-		else if(err == 0)
+		} else if (err == 0)
 			break;
 
-		if(copy_to_user(buf, new_buf, err)){
+		if (copy_to_user(buf, new_buf, err)) {
 			n = -EFAULT;
 			goto out_free;
 		}
@@ -297,35 +298,36 @@ static ssize_t hppfs_read(struct file *f
 	loff_t off;
 	int err;
 
-	if(hppfs->contents != NULL){
-		if(*ppos >= hppfs->len) return(0);
+	if (hppfs->contents != NULL) {
+		if (*ppos >= hppfs->len)
+			return 0;
 
 		data = hppfs->contents;
 		off = *ppos;
-		while(off >= sizeof(data->contents)){
+		while (off >= sizeof(data->contents)) {
 			data = list_entry(data->list.next, struct hppfs_data,
 					  list);
 			off -= sizeof(data->contents);
 		}
 
-		if(off + count > hppfs->len)
+		if (off + count > hppfs->len)
 			count = hppfs->len - off;
 		copy_to_user(buf, &data->contents[off], count);
 		*ppos += count;
-	}
-	else if(hppfs->host_fd != -1){
+	} else if (hppfs->host_fd != -1) {
 		err = os_seek_file(hppfs->host_fd, *ppos);
-		if(err){
-			printk("hppfs_read : seek failed, errno = %d\n", err);
-			return(err);
+		if (err) {
+			printk(KERN_ERR "hppfs_read : seek failed, "
+			       "errno = %d\n", err);
+			return err;
 		}
 		count = hppfs_read_file(hppfs->host_fd, buf, count);
-		if(count > 0)
+		if (count > 0)
 			*ppos += count;
 	}
 	else count = read_proc(hppfs->proc_file, buf, count, ppos, 1);
 
-	return(count);
+	return count;
 }
 
 static ssize_t hppfs_write(struct file *file, const char __user *buf, size_t len,
@@ -342,7 +344,7 @@ static ssize_t hppfs_write(struct file *
 	err = (*write)(proc_file, buf, len, &proc_file->f_pos);
 	file->f_pos = proc_file->f_pos;
 
-	return(err);
+	return err;
 }
 
 static int open_host_sock(char *host_file, int *filter_out)
@@ -354,13 +356,13 @@ static int open_host_sock(char *host_fil
 	strcpy(end, "/rw");
 	*filter_out = 1;
 	fd = os_connect_socket(host_file);
-	if(fd > 0)
-		return(fd);
+	if (fd > 0)
+		return fd;
 
 	strcpy(end, "/r");
 	*filter_out = 0;
 	fd = os_connect_socket(host_file);
-	return(fd);
+	return fd;
 }
 
 static void free_contents(struct hppfs_data *head)
@@ -368,9 +370,10 @@ static void free_contents(struct hppfs_d
 	struct hppfs_data *data;
 	struct list_head *ele, *next;
 
-	if(head == NULL) return;
+	if (head == NULL)
+		return;
 
-	list_for_each_safe(ele, next, &head->list){
+	list_for_each_safe(ele, next, &head->list) {
 		data = list_entry(ele, struct hppfs_data, list);
 		kfree(data);
 	}
@@ -387,8 +390,8 @@ static struct hppfs_data *hppfs_get_data
 
 	err = -ENOMEM;
 	data = kmalloc(sizeof(*data), GFP_KERNEL);
-	if(data == NULL){
-		printk("hppfs_get_data : head allocation failed\n");
+	if (data == NULL) {
+		printk(KERN_ERR "hppfs_get_data : head allocation failed\n");
 		goto failed;
 	}
 
@@ -397,36 +400,36 @@ static struct hppfs_data *hppfs_get_data
 	head = data;
 	*size_out = 0;
 
-	if(filter){
-		while((n = read_proc(proc_file, data->contents,
+	if (filter) {
+		while ((n = read_proc(proc_file, data->contents,
 				     sizeof(data->contents), NULL, 0)) > 0)
 			os_write_file(fd, data->contents, n);
 		err = os_shutdown_socket(fd, 0, 1);
-		if(err){
-			printk("hppfs_get_data : failed to shut down "
+		if (err) {
+			printk(KERN_ERR "hppfs_get_data : failed to shut down "
 			       "socket\n");
 			goto failed_free;
 		}
 	}
-	while(1){
+	while (1) {
 		n = os_read_file(fd, data->contents, sizeof(data->contents));
-		if(n < 0){
+		if (n < 0) {
 			err = n;
-			printk("hppfs_get_data : read failed, errno = %d\n",
-			       err);
+			printk(KERN_ERR "hppfs_get_data : read failed, "
+			       "errno = %d\n", err);
 			goto failed_free;
-		}
-		else if(n == 0)
+		} else if (n == 0)
 			break;
 
 		*size_out += n;
 
-		if(n < sizeof(data->contents))
+		if (n < sizeof(data->contents))
 			break;
 
 		new = kmalloc(sizeof(*data), GFP_KERNEL);
-		if(new == 0){
-			printk("hppfs_get_data : data allocation failed\n");
+		if (new == 0) {
+			printk(KERN_ERR "hppfs_get_data : data allocation "
+			       "failed\n");
 			err = -ENOMEM;
 			goto failed_free;
 		}
@@ -435,12 +438,12 @@ static struct hppfs_data *hppfs_get_data
 		list_add(&new->list, &data->list);
 		data = new;
 	}
-	return(head);
+	return head;
 
  failed_free:
 	free_contents(head);
  failed:
-	return(ERR_PTR(err));
+	return ERR_PTR(err);
 }
 
 static struct hppfs_private *hppfs_data(void)
@@ -448,77 +451,79 @@ static struct hppfs_private *hppfs_data(
 	struct hppfs_private *data;
 
 	data = kmalloc(sizeof(*data), GFP_KERNEL);
-	if(data == NULL)
-		return(data);
+	if (data == NULL)
+		return data;
 
 	*data = ((struct hppfs_private ) { .host_fd  		= -1,
 					   .len  		= -1,
 					   .contents 		= NULL } );
-	return(data);
+	return data;
 }
 
 static int file_mode(int fmode)
 {
-	if(fmode == (FMODE_READ | FMODE_WRITE))
-		return(O_RDWR);
-	if(fmode == FMODE_READ)
-		return(O_RDONLY);
-	if(fmode == FMODE_WRITE)
-		return(O_WRONLY);
-	return(0);
+	if (fmode == (FMODE_READ | FMODE_WRITE))
+		return O_RDWR;
+	if (fmode == FMODE_READ)
+		return O_RDONLY;
+	if (fmode == FMODE_WRITE)
+		return O_WRONLY;
+	return 0;
 }
 
 static int hppfs_open(struct inode *inode, struct file *file)
 {
 	struct hppfs_private *data;
 	struct dentry *proc_dentry;
+	struct vfsmount *proc_mnt;
 	char *host_file;
 	int err, fd, type, filter;
 
 	err = -ENOMEM;
 	data = hppfs_data();
-	if(data == NULL)
+	if (data == NULL)
 		goto out;
 
 	host_file = dentry_name(file->f_path.dentry, strlen("/rw"));
-	if(host_file == NULL)
+	if (host_file == NULL)
 		goto out_free2;
 
 	proc_dentry = HPPFS_I(inode)->proc_dentry;
+	proc_mnt = HPPFS_I(inode)->proc_mnt;
 
 	/* XXX This isn't closed anywhere */
-	data->proc_file = dentry_open(dget(proc_dentry), NULL,
+	data->proc_file = dentry_open(dget(proc_dentry), mntget(proc_mnt),
 				      file_mode(file->f_mode));
 	err = PTR_ERR(data->proc_file);
-	if(IS_ERR(data->proc_file))
+	if (IS_ERR(data->proc_file))
 		goto out_free1;
 
 	type = os_file_type(host_file);
-	if(type == OS_TYPE_FILE){
+	if (type == OS_TYPE_FILE) {
 		fd = os_open_file(host_file, of_read(OPENFLAGS()), 0);
-		if(fd >= 0)
+		if (fd >= 0)
 			data->host_fd = fd;
-		else printk("hppfs_open : failed to open '%s', errno = %d\n",
-			    host_file, -fd);
+		else
+			printk(KERN_ERR "hppfs_open : failed to open '%s', "
+			       "errno = %d\n", host_file, -fd);
 
 		data->contents = NULL;
-	}
-	else if(type == OS_TYPE_DIR){
+	} else if (type == OS_TYPE_DIR) {
 		fd = open_host_sock(host_file, &filter);
-		if(fd > 0){
+		if (fd > 0) {
 			data->contents = hppfs_get_data(fd, filter,
 							data->proc_file,
 							file, &data->len);
-			if(!IS_ERR(data->contents))
+			if (!IS_ERR(data->contents))
 				data->host_fd = fd;
-		}
-		else printk("hppfs_open : failed to open a socket in "
-			    "'%s', errno = %d\n", host_file, -fd);
+		} else
+			printk(KERN_ERR "hppfs_open : failed to open a socket "
+			       "in '%s', errno = %d\n", host_file, -fd);
 	}
 	kfree(host_file);
 
 	file->private_data = data;
-	return(0);
+	return 0;
 
  out_free1:
 	kfree(host_file);
@@ -526,34 +531,36 @@ static int hppfs_open(struct inode *inod
 	free_contents(data->contents);
 	kfree(data);
  out:
-	return(err);
+	return err;
 }
 
 static int hppfs_dir_open(struct inode *inode, struct file *file)
 {
 	struct hppfs_private *data;
 	struct dentry *proc_dentry;
+	struct vfsmount *proc_mnt;
 	int err;
 
 	err = -ENOMEM;
 	data = hppfs_data();
-	if(data == NULL)
+	if (data == NULL)
 		goto out;
 
 	proc_dentry = HPPFS_I(inode)->proc_dentry;
-	data->proc_file = dentry_open(dget(proc_dentry), NULL,
+	proc_mnt = HPPFS_I(inode)->proc_mnt;
+	data->proc_file = dentry_open(dget(proc_dentry), mntget(proc_mnt),
 				      file_mode(file->f_mode));
 	err = PTR_ERR(data->proc_file);
-	if(IS_ERR(data->proc_file))
+	if (IS_ERR(data->proc_file))
 		goto out_free;
 
 	file->private_data = data;
-	return(0);
+	return 0;
 
  out_free:
 	kfree(data);
  out:
-	return(err);
+	return err;
 }
 
 static loff_t hppfs_llseek(struct file *file, loff_t off, int where)
@@ -564,13 +571,13 @@ static loff_t hppfs_llseek(struct file *
 	loff_t ret;
 
 	llseek = proc_file->f_path.dentry->d_inode->i_fop->llseek;
-	if(llseek != NULL){
+	if (llseek != NULL) {
 		ret = (*llseek)(proc_file, off, where);
-		if(ret < 0)
-			return(ret);
+		if (ret < 0)
+			return ret;
 	}
 
-	return(default_llseek(file, off, where));
+	return default_llseek(file, off, where);
 }
 
 static const struct file_operations hppfs_file_fops = {
@@ -592,11 +599,11 @@ static int hppfs_filldir(void *d, const 
 {
 	struct hppfs_dirent *dirent = d;
 
-	if(file_removed(dirent->dentry, name))
-		return(0);
+	if (file_removed(dirent->dentry, name))
+		return 0;
 
-	return((*dirent->filldir)(dirent->vfs_dirent, name, size, offset,
-				  inode, type));
+	return (*dirent->filldir)(dirent->vfs_dirent, name, size, offset,
+				  inode, type);
 }
 
 static int hppfs_readdir(struct file *file, void *ent, filldir_t filldir)
@@ -607,7 +614,8 @@ static int hppfs_readdir(struct file *fi
 	struct hppfs_dirent dirent = ((struct hppfs_dirent)
 		                      { .vfs_dirent  	= ent,
 					.filldir 	= filldir,
-					.dentry  	= file->f_path.dentry } );
+					.dentry  	= file->f_path.dentry
+				      });
 	int err;
 
 	readdir = proc_file->f_path.dentry->d_inode->i_fop->readdir;
@@ -616,12 +624,12 @@ static int hppfs_readdir(struct file *fi
 	err = (*readdir)(proc_file, &dirent, hppfs_filldir);
 	file->f_pos = proc_file->f_pos;
 
-	return(err);
+	return err;
 }
 
 static int hppfs_fsync(struct file *file, struct dentry *dentry, int datasync)
 {
-	return(0);
+	return 0;
 }
 
 static const struct file_operations hppfs_dir_fops = {
@@ -639,7 +647,7 @@ static int hppfs_statfs(struct dentry *d
 	sf->f_files = 0;
 	sf->f_ffree = 0;
 	sf->f_type = HPPFS_SUPER_MAGIC;
-	return(0);
+	return 0;
 }
 
 static struct inode *hppfs_alloc_inode(struct super_block *sb)
@@ -647,12 +655,13 @@ static struct inode *hppfs_alloc_inode(s
 	struct hppfs_inode_info *hi;
 
 	hi = kmalloc(sizeof(*hi), GFP_KERNEL);
-	if(hi == NULL)
-		return(NULL);
+	if (!hi)
+		return NULL;
 
-	*hi = ((struct hppfs_inode_info) { .proc_dentry	= NULL });
+	hi->proc_dentry = NULL;
+	hi->proc_mnt = NULL;
 	inode_init_once(&hi->vfs_inode);
-	return(&hi->vfs_inode);
+	return &hi->vfs_inode;
 }
 
 void hppfs_delete_inode(struct inode *ino)
@@ -665,21 +674,31 @@ static void hppfs_destroy_inode(struct i
 	kfree(HPPFS_I(inode));
 }
 
+static void hppfs_put_super(struct super_block *sb)
+{
+	mntput(HPPFS_I(sb->s_root->d_inode)->proc_mnt);
+}
+
 static const struct super_operations hppfs_sbops = {
 	.alloc_inode	= hppfs_alloc_inode,
 	.destroy_inode	= hppfs_destroy_inode,
 	.delete_inode	= hppfs_delete_inode,
 	.statfs		= hppfs_statfs,
+	.put_super	= hppfs_put_super,
 };
 
-static int hppfs_readlink(struct dentry *dentry, char __user *buffer, int buflen)
+static int hppfs_readlink(struct dentry *dentry, char __user *buffer,
+			  int buflen)
 {
 	struct file *proc_file;
 	struct dentry *proc_dentry;
+	struct vfsmount *proc_mnt;
 	int ret;
 
 	proc_dentry = HPPFS_I(dentry->d_inode)->proc_dentry;
-	proc_file = dentry_open(dget(proc_dentry), NULL, O_RDONLY);
+	proc_mnt = HPPFS_I(dentry->d_inode)->proc_mnt;
+
+	proc_file = dentry_open(dget(proc_dentry), mntget(proc_mnt), O_RDONLY);
 	if (IS_ERR(proc_file))
 		return PTR_ERR(proc_file);
 
@@ -694,10 +713,13 @@ static void* hppfs_follow_link(struct de
 {
 	struct file *proc_file;
 	struct dentry *proc_dentry;
+	struct vfsmount *proc_mnt;
 	void *ret;
 
 	proc_dentry = HPPFS_I(dentry->d_inode)->proc_dentry;
-	proc_file = dentry_open(dget(proc_dentry), NULL, O_RDONLY);
+	proc_mnt = HPPFS_I(dentry->d_inode)->proc_mnt;
+
+	proc_file = dentry_open(dget(proc_dentry), mntget(proc_mnt), O_RDONLY);
 	if (IS_ERR(proc_file))
 		return proc_file;
 
@@ -717,44 +739,43 @@ static const struct inode_operations hpp
 	.follow_link	= hppfs_follow_link,
 };
 
-static int init_inode(struct inode *inode, struct dentry *dentry)
+static int init_inode(struct inode *inode, struct dentry *dentry,
+		struct vfsmount *mnt)
 {
-	if(S_ISDIR(dentry->d_inode->i_mode)){
+	if (S_ISDIR(dentry->d_inode->i_mode)) {
 		inode->i_op = &hppfs_dir_iops;
 		inode->i_fop = &hppfs_dir_fops;
-	}
-	else if(S_ISLNK(dentry->d_inode->i_mode)){
+	} else if (S_ISLNK(dentry->d_inode->i_mode)) {
 		inode->i_op = &hppfs_link_iops;
 		inode->i_fop = &hppfs_file_fops;
-	}
-	else {
+	} else {
 		inode->i_op = &hppfs_file_iops;
 		inode->i_fop = &hppfs_file_fops;
 	}
 
 	HPPFS_I(inode)->proc_dentry = dentry;
+	HPPFS_I(inode)->proc_mnt = mnt;
 
-	return(0);
+	return 0;
 }
 
 static int hppfs_fill_super(struct super_block *sb, void *d, int silent)
 {
 	struct inode *root_inode;
 	struct file_system_type *procfs;
-	struct super_block *proc_sb;
+	struct vfsmount *proc_mnt;
 	int err;
 
 	err = -ENOENT;
 	procfs = get_fs_type("proc");
-	if(procfs == NULL)
+	if (!procfs)
 		goto out;
 
-	if(list_empty(&procfs->fs_supers))
+	proc_mnt = vfs_kern_mount(procfs, 0, procfs->name, NULL);
+	put_filesystem(procfs);
+	if (IS_ERR(proc_mnt))
 		goto out;
 
-	proc_sb = list_entry(procfs->fs_supers.next, struct super_block,
-			     s_instances);
-
 	sb->s_blocksize = 1024;
 	sb->s_blocksize_bits = 10;
 	sb->s_magic = HPPFS_SUPER_MAGIC;
@@ -766,21 +787,23 @@ static int hppfs_fill_super(struct super
 		goto out;
 	}
 
-	err = init_inode(root_inode, proc_sb->s_root);
-	if(err)
-		goto out_put;
+	err = init_inode(root_inode, proc_mnt->mnt_sb->s_root, proc_mnt);
+	if (err)
+		goto out_iput;
 
 	err = -ENOMEM;
 	sb->s_root = d_alloc_root(root_inode);
-	if(sb->s_root == NULL)
-		goto out_put;
+	if (!sb->s_root)
+		goto out_iput;
 
 	hppfs_read_inode(root_inode);
 
-	return(0);
+	return 0;
 
- out_put:
+ out_iput:
 	iput(root_inode);
+ out_mntput:
+	mntput(proc_mnt);
  out:
 	return(err);
 }
@@ -802,7 +825,7 @@ static struct file_system_type hppfs_typ
 
 static int __init init_hppfs(void)
 {
-	return(register_filesystem(&hppfs_type));
+	return register_filesystem(&hppfs_type);
 }
 
 static void __exit exit_hppfs(void)
@@ -813,14 +836,3 @@ static void __exit exit_hppfs(void)
 module_init(init_hppfs)
 module_exit(exit_hppfs)
 MODULE_LICENSE("GPL");
-
-/*
- * Overrides for Emacs so that we follow Linus's tabbing style.
- * Emacs will notice this stuff at the end of the file and automatically
- * adjust the settings for this buffer only.  This must remain at the end
- * of the file.
- * ---------------------------------------------------------------------------
- * Local variables:
- * c-file-style: "linux"
- * End:
- */
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 03/30] check for null vfsmount in dentry_open()
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
  2008-02-08 22:26 ` [RFC][PATCH 01/30] reiserfs: eliminate private use of struct file in xattr Dave Hansen
  2008-02-08 22:26 ` [RFC][PATCH 02/30] hppfs pass vfsmount to dentry_open() Dave Hansen
@ 2008-02-08 22:26 ` Dave Hansen
  2008-02-08 22:26 ` [RFC][PATCH 04/30] fix up new filp allocators Dave Hansen
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


Make sure no-one calls dentry_open with a NULL vfsmount argument and crap
out with a stacktrace otherwise.  A NULL file->f_vfsmnt has always been
problematic, but with the per-mount r/o tracking we can't accept anymore
at all.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---

 linux-2.6.git-dave/fs/open.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff -puN fs/open.c~check-for-NULL-vfsmount-in-dentry-open fs/open.c
--- linux-2.6.git/fs/open.c~check-for-NULL-vfsmount-in-dentry-open	2008-02-08 13:04:45.000000000 -0800
+++ linux-2.6.git-dave/fs/open.c	2008-02-08 13:04:45.000000000 -0800
@@ -906,6 +906,18 @@ struct file *dentry_open(struct dentry *
 	int error;
 	struct file *f;
 
+	/*
+	 * We must always pass in a valid mount pointer.   Historically
+	 * callers got away with not passing it, but we must enforce this at
+	 * the earliest possible point now to avoid strange problems deep in the
+	 * filesystem stack.
+	 */
+	if (!mnt) {
+		printk(KERN_WARNING "%s called with NULL vfsmount\n", __func__);
+		dump_stack();
+		return ERR_PTR(-EINVAL);
+	}
+
 	error = -ENFILE;
 	f = get_empty_filp();
 	if (f == NULL) {
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 04/30] fix up new filp allocators
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (2 preceding siblings ...)
  2008-02-08 22:26 ` [RFC][PATCH 03/30] check for null vfsmount in dentry_open() Dave Hansen
@ 2008-02-08 22:26 ` Dave Hansen
  2008-02-08 22:26 ` [RFC][PATCH 05/30] do namei_flags calculation inside open_namei() Dave Hansen
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


Some new uses of get_empty_filp() have crept in, and are
not properly taking mnt_want_write()s.  This fixes them
up.

We really need to kill get_empty_filp().

Cc: Erez Zadok <ezk@cs.sunysb.edu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---

 linux-2.6.git-dave/fs/anon_inodes.c |   16 ++++++----------
 linux-2.6.git-dave/fs/file_table.c  |    6 ++++++
 linux-2.6.git-dave/fs/pipe.c        |   19 +++++++++----------
 3 files changed, 21 insertions(+), 20 deletions(-)

diff -puN fs/anon_inodes.c~fix-up-new-filp-allocators fs/anon_inodes.c
--- linux-2.6.git/fs/anon_inodes.c~fix-up-new-filp-allocators	2008-02-08 13:04:45.000000000 -0800
+++ linux-2.6.git-dave/fs/anon_inodes.c	2008-02-08 13:04:45.000000000 -0800
@@ -81,13 +81,10 @@ int anon_inode_getfd(int *pfd, struct in
 
 	if (IS_ERR(anon_inode_inode))
 		return -ENODEV;
-	file = get_empty_filp();
-	if (!file)
-		return -ENFILE;
 
 	error = get_unused_fd();
 	if (error < 0)
-		goto err_put_filp;
+		return error;
 	fd = error;
 
 	/*
@@ -114,14 +111,15 @@ int anon_inode_getfd(int *pfd, struct in
 	dentry->d_flags &= ~DCACHE_UNHASHED;
 	d_instantiate(dentry, anon_inode_inode);
 
-	file->f_path.mnt = mntget(anon_inode_mnt);
-	file->f_path.dentry = dentry;
+	error = -ENFILE;
+	file = alloc_file(anon_inode_mnt, dentry,
+			  FMODE_READ | FMODE_WRITE, fops);
+	if (!file)
+		goto err_put_unused_fd;
 	file->f_mapping = anon_inode_inode->i_mapping;
 
 	file->f_pos = 0;
 	file->f_flags = O_RDWR;
-	file->f_op = fops;
-	file->f_mode = FMODE_READ | FMODE_WRITE;
 	file->f_version = 0;
 	file->private_data = priv;
 
@@ -134,8 +132,6 @@ int anon_inode_getfd(int *pfd, struct in
 
 err_put_unused_fd:
 	put_unused_fd(fd);
-err_put_filp:
-	put_filp(file);
 	return error;
 }
 EXPORT_SYMBOL_GPL(anon_inode_getfd);
diff -puN fs/file_table.c~fix-up-new-filp-allocators fs/file_table.c
--- linux-2.6.git/fs/file_table.c~fix-up-new-filp-allocators	2008-02-08 13:04:45.000000000 -0800
+++ linux-2.6.git-dave/fs/file_table.c	2008-02-08 13:04:45.000000000 -0800
@@ -83,6 +83,12 @@ int proc_nr_files(ctl_table *table, int 
 /* Find an unused file structure and return a pointer to it.
  * Returns NULL, if there are no more free file structures or
  * we run out of memory.
+ *
+ * Be very careful using this.  You are responsible for
+ * getting write access to any mount that you might assign
+ * to this filp, if it is opened for write.  If this is not
+ * done, you will imbalance int the mount's writer count
+ * and a warning at __fput() time.
  */
 struct file *get_empty_filp(void)
 {
diff -puN fs/pipe.c~fix-up-new-filp-allocators fs/pipe.c
--- linux-2.6.git/fs/pipe.c~fix-up-new-filp-allocators	2008-02-08 13:04:45.000000000 -0800
+++ linux-2.6.git-dave/fs/pipe.c	2008-02-08 13:04:45.000000000 -0800
@@ -959,13 +959,10 @@ struct file *create_write_pipe(void)
 	struct dentry *dentry;
 	struct qstr name = { .name = "" };
 
-	f = get_empty_filp();
-	if (!f)
-		return ERR_PTR(-ENFILE);
 	err = -ENFILE;
 	inode = get_pipe_inode();
 	if (!inode)
-		goto err_file;
+		goto err;
 
 	err = -ENOMEM;
 	dentry = d_alloc(pipe_mnt->mnt_sb->s_root, &name);
@@ -980,22 +977,24 @@ struct file *create_write_pipe(void)
 	 */
 	dentry->d_flags &= ~DCACHE_UNHASHED;
 	d_instantiate(dentry, inode);
-	f->f_path.mnt = mntget(pipe_mnt);
-	f->f_path.dentry = dentry;
+
+	err = -ENFILE;
+	f = alloc_file(pipe_mnt, dentry, FMODE_WRITE, &write_pipe_fops);
+	if (!f)
+		goto err_dentry;
 	f->f_mapping = inode->i_mapping;
 
 	f->f_flags = O_WRONLY;
-	f->f_op = &write_pipe_fops;
-	f->f_mode = FMODE_WRITE;
 	f->f_version = 0;
 
 	return f;
 
+ err_dentry:
+	dput(dentry);
  err_inode:
 	free_pipe_info(inode);
 	iput(inode);
- err_file:
-	put_filp(f);
+ err:
 	return ERR_PTR(err);
 }
 
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 05/30] do namei_flags calculation inside open_namei()
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (3 preceding siblings ...)
  2008-02-08 22:26 ` [RFC][PATCH 04/30] fix up new filp allocators Dave Hansen
@ 2008-02-08 22:26 ` Dave Hansen
  2008-02-08 22:26 ` [RFC][PATCH 06/30] make open_namei() return a filp Dave Hansen
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


My end goal here is to make sure all users of may_open()
return filps.  This will ensure that we properly release
mount write counts which were taken for the filp in
may_open().

This patch moves the sys_open flags to namei flags
calculation into fs/namei.c.  We'll shortly be moving
the nameidata_to_filp() calls into namei.c, and this
gets the sys_open flags to a place where we can get
at them when we need them.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---

 linux-2.6.git-dave/fs/namei.c |   43 +++++++++++++++++++++++++++++++++---------
 linux-2.6.git-dave/fs/open.c  |   22 +--------------------
 2 files changed, 36 insertions(+), 29 deletions(-)

diff -puN fs/namei.c~do-namei_flags-calculation-inside-open_namei fs/namei.c
--- linux-2.6.git/fs/namei.c~do-namei_flags-calculation-inside-open_namei	2008-02-08 13:04:46.000000000 -0800
+++ linux-2.6.git-dave/fs/namei.c	2008-02-08 13:04:46.000000000 -0800
@@ -1674,7 +1674,12 @@ int may_open(struct nameidata *nd, int a
 	return 0;
 }
 
-static int open_namei_create(struct nameidata *nd, struct path *path,
+/*
+ * Be careful about ever adding any more callers of this
+ * function.  Its flags must be in the namei format, not
+ * what get passed to sys_open().
+ */
+static int __open_namei_create(struct nameidata *nd, struct path *path,
 				int flag, int mode)
 {
 	int error;
@@ -1693,26 +1698,46 @@ static int open_namei_create(struct name
 }
 
 /*
+ * Note that while the flag value (low two bits) for sys_open means:
+ *	00 - read-only
+ *	01 - write-only
+ *	10 - read-write
+ *	11 - special
+ * it is changed into
+ *	00 - no permissions needed
+ *	01 - read-permission
+ *	10 - write-permission
+ *	11 - read-write
+ * for the internal routines (ie open_namei()/follow_link() etc)
+ * This is more logical, and also allows the 00 "no perm needed"
+ * to be used for symlinks (where the permissions are checked
+ * later).
+ *
+*/
+static inline int open_to_namei_flags(int flag)
+{
+	if ((flag+1) & O_ACCMODE)
+		flag++;
+	return flag;
+}
+
+/*
  *	open_namei()
  *
  * namei for open - this is in fact almost the whole open-routine.
  *
  * Note that the low bits of "flag" aren't the same as in the open
- * system call - they are 00 - no permissions needed
- *			  01 - read permission needed
- *			  10 - write permission needed
- *			  11 - read/write permissions needed
- * which is a lot more logical, and also allows the "no perm" needed
- * for symlinks (where the permissions are checked later).
+ * system call.  See open_to_namei_flags().
  * SMP-safe
  */
-int open_namei(int dfd, const char *pathname, int flag,
+int open_namei(int dfd, const char *pathname, int open_flag,
 		int mode, struct nameidata *nd)
 {
 	int acc_mode, error;
 	struct path path;
 	struct dentry *dir;
 	int count = 0;
+	int flag = open_to_namei_flags(open_flag);
 
 	acc_mode = ACC_MODE(flag);
 
@@ -1773,7 +1798,7 @@ do_last:
 
 	/* Negative dentry, just create the file */
 	if (!path.dentry->d_inode) {
-		error = open_namei_create(nd, &path, flag, mode);
+		error = __open_namei_create(nd, &path, flag, mode);
 		if (error)
 			goto exit;
 		return 0;
diff -puN fs/open.c~do-namei_flags-calculation-inside-open_namei fs/open.c
--- linux-2.6.git/fs/open.c~do-namei_flags-calculation-inside-open_namei	2008-02-08 13:04:46.000000000 -0800
+++ linux-2.6.git-dave/fs/open.c	2008-02-08 13:04:46.000000000 -0800
@@ -800,31 +800,13 @@ cleanup_file:
 	return ERR_PTR(error);
 }
 
-/*
- * Note that while the flag value (low two bits) for sys_open means:
- *	00 - read-only
- *	01 - write-only
- *	10 - read-write
- *	11 - special
- * it is changed into
- *	00 - no permissions needed
- *	01 - read-permission
- *	10 - write-permission
- *	11 - read-write
- * for the internal routines (ie open_namei()/follow_link() etc). 00 is
- * used by symlinks.
- */
 static struct file *do_filp_open(int dfd, const char *filename, int flags,
 				 int mode)
 {
-	int namei_flags, error;
+	int error;
 	struct nameidata nd;
 
-	namei_flags = flags;
-	if ((namei_flags+1) & O_ACCMODE)
-		namei_flags++;
-
-	error = open_namei(dfd, filename, namei_flags, mode, &nd);
+	error = open_namei(dfd, filename, flags, mode, &nd);
 	if (!error)
 		return nameidata_to_filp(&nd, flags);
 
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 06/30] make open_namei() return a filp
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (4 preceding siblings ...)
  2008-02-08 22:26 ` [RFC][PATCH 05/30] do namei_flags calculation inside open_namei() Dave Hansen
@ 2008-02-08 22:26 ` Dave Hansen
  2008-02-09  5:09   ` Christoph Hellwig
  2008-02-08 22:26 ` [RFC][PATCH 07/30] r/o bind mounts: stub functions Dave Hansen
                   ` (26 subsequent siblings)
  32 siblings, 1 reply; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


open_namei() will, in the future, need to take mount write counts
over its creation and truncation (via may_open()) operations.  It
needs to keep these write counts until any potential filp that is
created gets __fput()'d.

This gets complicated in the error handling and becomes very murky
as to how far open_namei() actually got, and whether or not that
mount write count was taken.

Creating the filps inside of open_namei() lets us shift the write
count to be taken and released along with the filp.  We can hold
a temporary write count during those creation and truncation
operations, then release them once the write for the filp has
been established.

Any caller who gets a 'struct file' back must consider that filp
instantiated and fput() it normally.  The callers no longer
have to worry about ever manually releasing a mnt write count.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---

 linux-2.6.git-dave/fs/namei.c         |  100 +++++++++++++++++++---------------
 linux-2.6.git-dave/fs/open.c          |   19 ------
 linux-2.6.git-dave/include/linux/fs.h |    3 -
 3 files changed, 59 insertions(+), 63 deletions(-)

diff -puN fs/namei.c~make-open_namei-return-a-filp fs/namei.c
--- linux-2.6.git/fs/namei.c~make-open_namei-return-a-filp	2008-02-08 13:04:47.000000000 -0800
+++ linux-2.6.git-dave/fs/namei.c	2008-02-08 13:04:47.000000000 -0800
@@ -1722,17 +1722,13 @@ static inline int open_to_namei_flags(in
 }
 
 /*
- *	open_namei()
- *
- * namei for open - this is in fact almost the whole open-routine.
- *
  * Note that the low bits of "flag" aren't the same as in the open
  * system call.  See open_to_namei_flags().
- * SMP-safe
  */
-int open_namei(int dfd, const char *pathname, int open_flag,
-		int mode, struct nameidata *nd)
+struct file *do_filp_open(int dfd, const char *pathname,
+		int open_flag, int mode)
 {
+	struct nameidata nd;
 	int acc_mode, error;
 	struct path path;
 	struct dentry *dir;
@@ -1755,18 +1751,19 @@ int open_namei(int dfd, const char *path
 	 */
 	if (!(flag & O_CREAT)) {
 		error = path_lookup_open(dfd, pathname, lookup_flags(flag),
-					 nd, flag);
+					 &nd, flag);
 		if (error)
-			return error;
+			return ERR_PTR(error);
 		goto ok;
 	}
 
 	/*
 	 * Create - we need to know the parent.
 	 */
-	error = path_lookup_create(dfd,pathname,LOOKUP_PARENT,nd,flag,mode);
+	error = path_lookup_create(dfd, pathname, LOOKUP_PARENT,
+				   &nd, flag, mode);
 	if (error)
-		return error;
+		return ERR_PTR(error);
 
 	/*
 	 * We have the parent and last component. First of all, check
@@ -1774,14 +1771,14 @@ int open_namei(int dfd, const char *path
 	 * will not do.
 	 */
 	error = -EISDIR;
-	if (nd->last_type != LAST_NORM || nd->last.name[nd->last.len])
+	if (nd.last_type != LAST_NORM || nd.last.name[nd.last.len])
 		goto exit;
 
-	dir = nd->dentry;
-	nd->flags &= ~LOOKUP_PARENT;
+	dir = nd.dentry;
+	nd.flags &= ~LOOKUP_PARENT;
 	mutex_lock(&dir->d_inode->i_mutex);
-	path.dentry = lookup_hash(nd);
-	path.mnt = nd->mnt;
+	path.dentry = lookup_hash(&nd);
+	path.mnt = nd.mnt;
 
 do_last:
 	error = PTR_ERR(path.dentry);
@@ -1790,18 +1787,18 @@ do_last:
 		goto exit;
 	}
 
-	if (IS_ERR(nd->intent.open.file)) {
+	if (IS_ERR(nd.intent.open.file)) {
 		mutex_unlock(&dir->d_inode->i_mutex);
-		error = PTR_ERR(nd->intent.open.file);
+		error = PTR_ERR(nd.intent.open.file);
 		goto exit_dput;
 	}
 
 	/* Negative dentry, just create the file */
 	if (!path.dentry->d_inode) {
-		error = __open_namei_create(nd, &path, flag, mode);
+		error = __open_namei_create(&nd, &path, flag, mode);
 		if (error)
 			goto exit;
-		return 0;
+		return nameidata_to_filp(&nd, open_flag);
 	}
 
 	/*
@@ -1826,23 +1823,23 @@ do_last:
 	if (path.dentry->d_inode->i_op && path.dentry->d_inode->i_op->follow_link)
 		goto do_link;
 
-	path_to_nameidata(&path, nd);
+	path_to_nameidata(&path, &nd);
 	error = -EISDIR;
 	if (path.dentry->d_inode && S_ISDIR(path.dentry->d_inode->i_mode))
 		goto exit;
 ok:
-	error = may_open(nd, acc_mode, flag);
+	error = may_open(&nd, acc_mode, flag);
 	if (error)
 		goto exit;
-	return 0;
+	return nameidata_to_filp(&nd, open_flag);
 
 exit_dput:
-	dput_path(&path, nd);
+	dput_path(&path, &nd);
 exit:
-	if (!IS_ERR(nd->intent.open.file))
-		release_open_intent(nd);
-	path_release(nd);
-	return error;
+	if (!IS_ERR(nd.intent.open.file))
+		release_open_intent(&nd);
+	path_release(&nd);
+	return ERR_PTR(error);
 
 do_link:
 	error = -ELOOP;
@@ -1858,43 +1855,60 @@ do_link:
 	 * stored in nd->last.name and we will have to putname() it when we
 	 * are done. Procfs-like symlinks just set LAST_BIND.
 	 */
-	nd->flags |= LOOKUP_PARENT;
-	error = security_inode_follow_link(path.dentry, nd);
+	nd.flags |= LOOKUP_PARENT;
+	error = security_inode_follow_link(path.dentry, &nd);
 	if (error)
 		goto exit_dput;
-	error = __do_follow_link(&path, nd);
+	error = __do_follow_link(&path, &nd);
 	if (error) {
 		/* Does someone understand code flow here? Or it is only
 		 * me so stupid? Anathema to whoever designed this non-sense
 		 * with "intent.open".
 		 */
-		release_open_intent(nd);
-		return error;
+		release_open_intent(&nd);
+		return ERR_PTR(error);
 	}
-	nd->flags &= ~LOOKUP_PARENT;
-	if (nd->last_type == LAST_BIND)
+	nd.flags &= ~LOOKUP_PARENT;
+	if (nd.last_type == LAST_BIND)
 		goto ok;
 	error = -EISDIR;
-	if (nd->last_type != LAST_NORM)
+	if (nd.last_type != LAST_NORM)
 		goto exit;
-	if (nd->last.name[nd->last.len]) {
-		__putname(nd->last.name);
+	if (nd.last.name[nd.last.len]) {
+		__putname(nd.last.name);
 		goto exit;
 	}
 	error = -ELOOP;
 	if (count++==32) {
-		__putname(nd->last.name);
+		__putname(nd.last.name);
 		goto exit;
 	}
-	dir = nd->dentry;
+	dir = nd.dentry;
 	mutex_lock(&dir->d_inode->i_mutex);
-	path.dentry = lookup_hash(nd);
-	path.mnt = nd->mnt;
-	__putname(nd->last.name);
+	path.dentry = lookup_hash(&nd);
+	path.mnt = nd.mnt;
+	__putname(nd.last.name);
 	goto do_last;
 }
 
 /**
+ * filp_open - open file and return file pointer
+ *
+ * @filename:	path to open
+ * @flags:	open flags as per the open(2) second argument
+ * @mode:	mode for the new file if O_CREAT is set, else ignored
+ *
+ * This is the helper to open a file from kernelspace if you really
+ * have to.  But in generally you should not do this, so please move
+ * along, nothing to see here..
+ */
+struct file *filp_open(const char *filename, int flags, int mode)
+{
+	return do_filp_open(AT_FDCWD, filename, flags, mode);
+}
+EXPORT_SYMBOL(filp_open);
+
+/**
  * lookup_create - lookup a dentry, creating it if it doesn't exist
  * @nd: nameidata info
  * @is_dir: directory flag
diff -puN fs/open.c~make-open_namei-return-a-filp fs/open.c
--- linux-2.6.git/fs/open.c~make-open_namei-return-a-filp	2008-02-08 13:04:47.000000000 -0800
+++ linux-2.6.git-dave/fs/open.c	2008-02-08 13:04:47.000000000 -0800
@@ -800,25 +800,6 @@ cleanup_file:
 	return ERR_PTR(error);
 }
 
-static struct file *do_filp_open(int dfd, const char *filename, int flags,
-				 int mode)
-{
-	int error;
-	struct nameidata nd;
-
-	error = open_namei(dfd, filename, flags, mode, &nd);
-	if (!error)
-		return nameidata_to_filp(&nd, flags);
-
-	return ERR_PTR(error);
-}
-
-struct file *filp_open(const char *filename, int flags, int mode)
-{
-	return do_filp_open(AT_FDCWD, filename, flags, mode);
-}
-EXPORT_SYMBOL(filp_open);
-
 /**
  * lookup_instantiate_filp - instantiates the open intent filp
  * @nd: pointer to nameidata
diff -puN include/linux/fs.h~make-open_namei-return-a-filp include/linux/fs.h
--- linux-2.6.git/include/linux/fs.h~make-open_namei-return-a-filp	2008-02-08 13:04:47.000000000 -0800
+++ linux-2.6.git-dave/include/linux/fs.h	2008-02-08 13:04:47.000000000 -0800
@@ -1729,7 +1729,8 @@ extern struct file *create_read_pipe(str
 extern struct file *create_write_pipe(void);
 extern void free_write_pipe(struct file *);
 
-extern int open_namei(int dfd, const char *, int, int, struct nameidata *);
+extern struct file *do_filp_open(int dfd, const char *pathname,
+		int open_flag, int mode);
 extern int may_open(struct nameidata *, int, int);
 
 extern int kernel_read(struct file *, unsigned long, char *, unsigned long);
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 07/30] r/o bind mounts: stub functions
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (5 preceding siblings ...)
  2008-02-08 22:26 ` [RFC][PATCH 06/30] make open_namei() return a filp Dave Hansen
@ 2008-02-08 22:26 ` Dave Hansen
  2008-02-08 22:26 ` [RFC][PATCH 08/30] r/o bind mounts: create helper to drop file write access Dave Hansen
                   ` (25 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


This patch adds two function mnt_want_write() and mnt_drop_write().  These are
used like a lock pair around and fs operations that might cause a write to the
filesystem.

Before these can become useful, we must first cover each place in the VFS
where writes are performed with a want/drop pair.  When that is complete, we
can actually introduce code that will safely check the counts before allowing
r/w<->r/o transitions to occur.

Cc: Christoph Hellwig <hch@lst.de>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---

 linux-2.6.git-dave/fs/namespace.c        |   54 +++++++++++++++++++++++++++++++
 linux-2.6.git-dave/include/linux/mount.h |    3 +
 2 files changed, 57 insertions(+)

diff -puN fs/namespace.c~r-o-bind-mounts-stub-functions fs/namespace.c
--- linux-2.6.git/fs/namespace.c~r-o-bind-mounts-stub-functions	2008-02-08 13:04:47.000000000 -0800
+++ linux-2.6.git-dave/fs/namespace.c	2008-02-08 13:04:47.000000000 -0800
@@ -80,6 +80,60 @@ struct vfsmount *alloc_vfsmnt(const char
 	return mnt;
 }
 
+/*
+ * Most r/o checks on a fs are for operations that take
+ * discrete amounts of time, like a write() or unlink().
+ * We must keep track of when those operations start
+ * (for permission checks) and when they end, so that
+ * we can determine when writes are able to occur to
+ * a filesystem.
+ */
+/**
+ * mnt_want_write - get write access to a mount
+ * @mnt: the mount on which to take a write
+ *
+ * This tells the low-level filesystem that a write is
+ * about to be performed to it, and makes sure that
+ * writes are allowed before returning success.  When
+ * the write operation is finished, mnt_drop_write()
+ * must be called.  This is effectively a refcount.
+ */
+int mnt_want_write(struct vfsmount *mnt)
+{
+	if (__mnt_is_readonly(mnt))
+		return -EROFS;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mnt_want_write);
+
+/**
+ * mnt_drop_write - give up write access to a mount
+ * @mnt: the mount on which to give up write access
+ *
+ * Tells the low-level filesystem that we are done
+ * performing writes to it.  Must be matched with
+ * mnt_want_write() call above.
+ */
+void mnt_drop_write(struct vfsmount *mnt)
+{
+}
+EXPORT_SYMBOL_GPL(mnt_drop_write);
+
+/*
+ * __mnt_is_readonly: check whether a mount is read-only
+ * @mnt: the mount to check for its write status
+ *
+ * This shouldn't be used directly ouside of the VFS.
+ * It does not guarantee that the filesystem will stay
+ * r/w, just that it is right *now*.  This can not and
+ * should not be used in place of IS_RDONLY(inode).
+ */
+int __mnt_is_readonly(struct vfsmount *mnt)
+{
+	return (mnt->mnt_sb->s_flags & MS_RDONLY);
+}
+EXPORT_SYMBOL_GPL(__mnt_is_readonly);
+
 int simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
 {
 	mnt->mnt_sb = sb;
diff -puN include/linux/mount.h~r-o-bind-mounts-stub-functions include/linux/mount.h
--- linux-2.6.git/include/linux/mount.h~r-o-bind-mounts-stub-functions	2008-02-08 13:04:47.000000000 -0800
+++ linux-2.6.git-dave/include/linux/mount.h	2008-02-08 13:04:47.000000000 -0800
@@ -70,9 +70,12 @@ static inline struct vfsmount *mntget(st
 	return mnt;
 }
 
+extern int mnt_want_write(struct vfsmount *mnt);
+extern void mnt_drop_write(struct vfsmount *mnt);
 extern void mntput_no_expire(struct vfsmount *mnt);
 extern void mnt_pin(struct vfsmount *mnt);
 extern void mnt_unpin(struct vfsmount *mnt);
+extern int __mnt_is_readonly(struct vfsmount *mnt);
 
 static inline void mntput(struct vfsmount *mnt)
 {
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 08/30] r/o bind mounts: create helper to drop file write access
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (6 preceding siblings ...)
  2008-02-08 22:26 ` [RFC][PATCH 07/30] r/o bind mounts: stub functions Dave Hansen
@ 2008-02-08 22:26 ` Dave Hansen
  2008-02-08 22:26 ` [RFC][PATCH 09/30] r/o bind mounts: drop write during emergency remount Dave Hansen
                   ` (24 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


If someone decides to demote a file from r/w to just
r/o, they can use this same code as __fput().

NFS does just that, and will use this in the next
patch.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Cc: Erez Zadok <ezk@cs.sunysb.edu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 linux-2.6.git-dave/fs/file_table.c      |   19 ++++++++++++++++++-
 linux-2.6.git-dave/fs/nfsd/nfs4state.c  |    3 ++-
 linux-2.6.git-dave/include/linux/file.h |    1 +
 3 files changed, 21 insertions(+), 2 deletions(-)

diff -puN fs/file_table.c~create-file_drop_write_access-helper fs/file_table.c
--- linux-2.6.git/fs/file_table.c~create-file_drop_write_access-helper	2008-02-08 13:04:48.000000000 -0800
+++ linux-2.6.git-dave/fs/file_table.c	2008-02-08 13:04:48.000000000 -0800
@@ -211,6 +211,23 @@ void fastcall fput(struct file *file)
 
 EXPORT_SYMBOL(fput);
 
+/**
+ * drop_file_write_access - give up ability to write to a file
+ * @file: the file to which we will stop writing
+ *
+ * This is a central place which will give up the ability
+ * to write to @file, along with access to write through
+ * its vfsmount.
+ */
+void drop_file_write_access(struct file *file)
+{
+	struct dentry *dentry = file->f_path.dentry;
+	struct inode *inode = dentry->d_inode;
+
+	put_write_access(inode);
+}
+EXPORT_SYMBOL_GPL(drop_file_write_access);
+
 /* __fput is called from task context when aio completion releases the last
  * last use of a struct file *.  Do not use otherwise.
  */
@@ -237,7 +254,7 @@ void fastcall __fput(struct file *file)
 		cdev_put(inode->i_cdev);
 	fops_put(file->f_op);
 	if (file->f_mode & FMODE_WRITE)
-		put_write_access(inode);
+		drop_file_write_access(file);
 	put_pid(file->f_owner.pid);
 	file_kill(file);
 	file->f_path.dentry = NULL;
diff -puN fs/nfsd/nfs4state.c~create-file_drop_write_access-helper fs/nfsd/nfs4state.c
--- linux-2.6.git/fs/nfsd/nfs4state.c~create-file_drop_write_access-helper	2008-02-08 13:04:48.000000000 -0800
+++ linux-2.6.git-dave/fs/nfsd/nfs4state.c	2008-02-08 13:04:48.000000000 -0800
@@ -41,6 +41,7 @@
 #include <linux/sunrpc/svc.h>
 #include <linux/nfsd/nfsd.h>
 #include <linux/nfsd/cache.h>
+#include <linux/file.h>
 #include <linux/mount.h>
 #include <linux/workqueue.h>
 #include <linux/smp_lock.h>
@@ -1239,7 +1240,7 @@ static inline void
 nfs4_file_downgrade(struct file *filp, unsigned int share_access)
 {
 	if (share_access & NFS4_SHARE_ACCESS_WRITE) {
-		put_write_access(filp->f_path.dentry->d_inode);
+		drop_file_write_access(filp);
 		filp->f_mode = (filp->f_mode | FMODE_READ) & ~FMODE_WRITE;
 	}
 }
diff -puN include/linux/file.h~create-file_drop_write_access-helper include/linux/file.h
--- linux-2.6.git/include/linux/file.h~create-file_drop_write_access-helper	2008-02-08 13:04:48.000000000 -0800
+++ linux-2.6.git-dave/include/linux/file.h	2008-02-08 13:04:48.000000000 -0800
@@ -61,6 +61,7 @@ extern struct kmem_cache *filp_cachep;
 
 extern void FASTCALL(__fput(struct file *));
 extern void FASTCALL(fput(struct file *));
+extern void drop_file_write_access(struct file *file);
 
 struct file_operations;
 struct vfsmount;
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 09/30] r/o bind mounts: drop write during emergency remount
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (7 preceding siblings ...)
  2008-02-08 22:26 ` [RFC][PATCH 08/30] r/o bind mounts: create helper to drop file write access Dave Hansen
@ 2008-02-08 22:26 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 10/30] r/o bind mounts: elevate write count for vfs_rmdir() Dave Hansen
                   ` (23 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


The emergency remount code forcibly removes FMODE_WRITE from
filps.  The r/o bind mount code notices that this was done
without a proper mnt_drop_write() and properly gives a
warning.

This patch does a mnt_drop_write() to keep everything
balanced.
	
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---

 linux-2.6.git-dave/fs/super.c |   20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff -puN fs/super.c~robind-sysrq-fix fs/super.c
--- linux-2.6.git/fs/super.c~robind-sysrq-fix	2008-02-08 13:04:48.000000000 -0800
+++ linux-2.6.git-dave/fs/super.c	2008-02-08 13:04:49.000000000 -0800
@@ -37,6 +37,7 @@
 #include <linux/idr.h>
 #include <linux/kobject.h>
 #include <linux/mutex.h>
+#include <linux/file.h>
 #include <asm/uaccess.h>
 
 
@@ -566,10 +567,25 @@ static void mark_files_ro(struct super_b
 {
 	struct file *f;
 
+retry:
 	file_list_lock();
 	list_for_each_entry(f, &sb->s_files, f_u.fu_list) {
-		if (S_ISREG(f->f_path.dentry->d_inode->i_mode) && file_count(f))
-			f->f_mode &= ~FMODE_WRITE;
+		struct vfsmount *mnt;
+		if (!S_ISREG(f->f_path.dentry->d_inode->i_mode))
+		       continue;
+		if (!file_count(f))
+			continue;
+		if (!(f->f_mode & FMODE_WRITE))
+			continue;
+		f->f_mode &= ~FMODE_WRITE;
+		mnt = f->f_path.mnt;
+		file_list_unlock();
+		/*
+		 * This can sleep, so we can't hold
+		 * the file_list_lock() spinlock.
+		 */
+		mnt_drop_write(mnt);
+		goto retry;
 	}
 	file_list_unlock();
 }
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 10/30] r/o bind mounts: elevate write count for vfs_rmdir()
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (8 preceding siblings ...)
  2008-02-08 22:26 ` [RFC][PATCH 09/30] r/o bind mounts: drop write during emergency remount Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 11/30] r/o bind mounts: elevate write count for callers of vfs_mkdir() Dave Hansen
                   ` (22 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


Elevate the write count during the vfs_rmdir() call.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 linux-2.6.git-dave/fs/namei.c |    5 +++++
 1 file changed, 5 insertions(+)

diff -puN fs/namei.c~r-o-bind-mounts-do_rmdir-elevate-write-count fs/namei.c
--- linux-2.6.git/fs/namei.c~r-o-bind-mounts-do_rmdir-elevate-write-count	2008-02-08 13:04:49.000000000 -0800
+++ linux-2.6.git-dave/fs/namei.c	2008-02-08 13:04:49.000000000 -0800
@@ -2187,7 +2187,12 @@ static long do_rmdir(int dfd, const char
 	error = PTR_ERR(dentry);
 	if (IS_ERR(dentry))
 		goto exit2;
+	error = mnt_want_write(nd.mnt);
+	if (error)
+		goto exit3;
 	error = vfs_rmdir(nd.dentry->d_inode, dentry);
+	mnt_drop_write(nd.mnt);
+exit3:
 	dput(dentry);
 exit2:
 	mutex_unlock(&nd.dentry->d_inode->i_mutex);
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 11/30] r/o bind mounts: elevate write count for callers of vfs_mkdir()
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (9 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 10/30] r/o bind mounts: elevate write count for vfs_rmdir() Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 12/30] r/o bind mounts: elevate mnt_writers for unlink callers Dave Hansen
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


Pretty self-explanatory.  Fits in with the rest of the series.

Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---

 linux-2.6.git-dave/fs/namei.c            |    5 +++++
 linux-2.6.git-dave/fs/nfsd/nfs4recover.c |    5 +++++
 2 files changed, 10 insertions(+)

diff -puN fs/namei.c~r-o-bind-mounts-elevate-mnt-writers-for-callers-of-vfs_mkdir fs/namei.c
--- linux-2.6.git/fs/namei.c~r-o-bind-mounts-elevate-mnt-writers-for-callers-of-vfs_mkdir	2008-02-08 13:04:50.000000000 -0800
+++ linux-2.6.git-dave/fs/namei.c	2008-02-08 13:04:50.000000000 -0800
@@ -2080,7 +2080,12 @@ asmlinkage long sys_mkdirat(int dfd, con
 
 	if (!IS_POSIXACL(nd.dentry->d_inode))
 		mode &= ~current->fs->umask;
+	error = mnt_want_write(nd.mnt);
+	if (error)
+		goto out_dput;
 	error = vfs_mkdir(nd.dentry->d_inode, dentry, mode);
+	mnt_drop_write(nd.mnt);
+out_dput:
 	dput(dentry);
 out_unlock:
 	mutex_unlock(&nd.dentry->d_inode->i_mutex);
diff -puN fs/nfsd/nfs4recover.c~r-o-bind-mounts-elevate-mnt-writers-for-callers-of-vfs_mkdir fs/nfsd/nfs4recover.c
--- linux-2.6.git/fs/nfsd/nfs4recover.c~r-o-bind-mounts-elevate-mnt-writers-for-callers-of-vfs_mkdir	2008-02-08 13:04:50.000000000 -0800
+++ linux-2.6.git-dave/fs/nfsd/nfs4recover.c	2008-02-08 13:04:50.000000000 -0800
@@ -41,6 +41,7 @@
 #include <linux/nfsd/xdr4.h>
 #include <linux/param.h>
 #include <linux/file.h>
+#include <linux/mount.h>
 #include <linux/namei.h>
 #include <asm/uaccess.h>
 #include <linux/scatterlist.h>
@@ -154,7 +155,11 @@ nfsd4_create_clid_dir(struct nfs4_client
 		dprintk("NFSD: nfsd4_create_clid_dir: DIRECTORY EXISTS\n");
 		goto out_put;
 	}
+	status = mnt_want_write(rec_dir.mnt);
+	if (status)
+		goto out_put;
 	status = vfs_mkdir(rec_dir.dentry->d_inode, dentry, S_IRWXU);
+	mnt_drop_write(rec_dir.mnt);
 out_put:
 	dput(dentry);
 out_unlock:
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 12/30] r/o bind mounts: elevate mnt_writers for unlink callers
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (10 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 11/30] r/o bind mounts: elevate write count for callers of vfs_mkdir() Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 13/30] r/o bind mounts: elevate write count for xattr_permission() callers Dave Hansen
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen



Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---

 linux-2.6.git-dave/fs/namei.c   |    4 ++++
 linux-2.6.git-dave/ipc/mqueue.c |    5 ++++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff -puN fs/namei.c~r-o-bind-mounts-elevate-mnt-writers-for-vfs_unlink-callers fs/namei.c
--- linux-2.6.git/fs/namei.c~r-o-bind-mounts-elevate-mnt-writers-for-vfs_unlink-callers	2008-02-08 13:04:50.000000000 -0800
+++ linux-2.6.git-dave/fs/namei.c	2008-02-08 13:04:50.000000000 -0800
@@ -2278,7 +2278,11 @@ static long do_unlinkat(int dfd, const c
 		inode = dentry->d_inode;
 		if (inode)
 			atomic_inc(&inode->i_count);
+		error = mnt_want_write(nd.mnt);
+		if (error)
+			goto exit2;
 		error = vfs_unlink(nd.dentry->d_inode, dentry);
+		mnt_drop_write(nd.mnt);
 	exit2:
 		dput(dentry);
 	}
diff -puN ipc/mqueue.c~r-o-bind-mounts-elevate-mnt-writers-for-vfs_unlink-callers ipc/mqueue.c
--- linux-2.6.git/ipc/mqueue.c~r-o-bind-mounts-elevate-mnt-writers-for-vfs_unlink-callers	2008-02-08 13:04:50.000000000 -0800
+++ linux-2.6.git-dave/ipc/mqueue.c	2008-02-08 13:04:50.000000000 -0800
@@ -743,8 +743,11 @@ asmlinkage long sys_mq_unlink(const char
 	inode = dentry->d_inode;
 	if (inode)
 		atomic_inc(&inode->i_count);
-
+	err = mnt_want_write(mqueue_mnt);
+	if (err)
+		goto out_err;
 	err = vfs_unlink(dentry->d_parent->d_inode, dentry);
+	mnt_drop_write(mqueue_mnt);
 out_err:
 	dput(dentry);
 
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 13/30] r/o bind mounts: elevate write count for xattr_permission() callers
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (11 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 12/30] r/o bind mounts: elevate mnt_writers for unlink callers Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 14/30] r/o bind mounts: elevate write count for ncp_ioctl() Dave Hansen
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


This basically audits the callers of xattr_permission(), which calls
permission() and can perform writes to the filesystem.

Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 linux-2.6.git-dave/fs/nfsd/nfs4proc.c |    7 ++++++-
 linux-2.6.git-dave/fs/xattr.c         |   16 ++++++++++++++--
 2 files changed, 20 insertions(+), 3 deletions(-)

diff -puN fs/nfsd/nfs4proc.c~r-o-bind-mounts-elevate-mount-count-for-extended-attributes fs/nfsd/nfs4proc.c
--- linux-2.6.git/fs/nfsd/nfs4proc.c~r-o-bind-mounts-elevate-mount-count-for-extended-attributes	2008-02-08 13:04:51.000000000 -0800
+++ linux-2.6.git-dave/fs/nfsd/nfs4proc.c	2008-02-08 13:04:51.000000000 -0800
@@ -658,14 +658,19 @@ nfsd4_setattr(struct svc_rqst *rqstp, st
 			return status;
 		}
 	}
+	status = mnt_want_write(cstate->current_fh.fh_export->ex_mnt);
+	if (status)
+		return status;
 	status = nfs_ok;
 	if (setattr->sa_acl != NULL)
 		status = nfsd4_set_nfs4_acl(rqstp, &cstate->current_fh,
 					    setattr->sa_acl);
 	if (status)
-		return status;
+		goto out;
 	status = nfsd_setattr(rqstp, &cstate->current_fh, &setattr->sa_iattr,
 				0, (time_t)0);
+out:
+	mnt_drop_write(cstate->current_fh.fh_export->ex_mnt);
 	return status;
 }
 
diff -puN fs/xattr.c~r-o-bind-mounts-elevate-mount-count-for-extended-attributes fs/xattr.c
--- linux-2.6.git/fs/xattr.c~r-o-bind-mounts-elevate-mount-count-for-extended-attributes	2008-02-08 13:04:51.000000000 -0800
+++ linux-2.6.git-dave/fs/xattr.c	2008-02-08 13:04:51.000000000 -0800
@@ -11,6 +11,7 @@
 #include <linux/slab.h>
 #include <linux/file.h>
 #include <linux/xattr.h>
+#include <linux/mount.h>
 #include <linux/namei.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
@@ -32,8 +33,6 @@ xattr_permission(struct inode *inode, co
 	 * filesystem  or on an immutable / append-only inode.
 	 */
 	if (mask & MAY_WRITE) {
-		if (IS_RDONLY(inode))
-			return -EROFS;
 		if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
 			return -EPERM;
 	}
@@ -262,7 +261,11 @@ sys_setxattr(char __user *path, char __u
 	error = user_path_walk(path, &nd);
 	if (error)
 		return error;
+	error = mnt_want_write(nd.mnt);
+	if (error)
+		return error;
 	error = setxattr(nd.dentry, name, value, size, flags);
+	mnt_drop_write(nd.mnt);
 	path_release(&nd);
 	return error;
 }
@@ -277,7 +280,11 @@ sys_lsetxattr(char __user *path, char __
 	error = user_path_walk_link(path, &nd);
 	if (error)
 		return error;
+	error = mnt_want_write(nd.mnt);
+	if (error)
+		return error;
 	error = setxattr(nd.dentry, name, value, size, flags);
+	mnt_drop_write(nd.mnt);
 	path_release(&nd);
 	return error;
 }
@@ -293,9 +300,14 @@ sys_fsetxattr(int fd, char __user *name,
 	f = fget(fd);
 	if (!f)
 		return error;
+	error = mnt_want_write(f->f_vfsmnt);
+	if (error)
+		goto out_fput;
 	dentry = f->f_path.dentry;
 	audit_inode(NULL, dentry);
 	error = setxattr(dentry, name, value, size, flags);
+	mnt_drop_write(f->f_vfsmnt);
+out_fput:
 	fput(f);
 	return error;
 }
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 14/30] r/o bind mounts: elevate write count for ncp_ioctl()
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (12 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 13/30] r/o bind mounts: elevate write count for xattr_permission() callers Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 15/30] r/o bind mounts: write counts for time functions Dave Hansen
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---

 linux-2.6.git-dave/fs/ncpfs/ioctl.c |   54 +++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff -puN fs/ncpfs/ioctl.c~r-o-bind-mounts-elevate-write-count-during-entire-ncp_ioctl fs/ncpfs/ioctl.c
--- linux-2.6.git/fs/ncpfs/ioctl.c~r-o-bind-mounts-elevate-write-count-during-entire-ncp_ioctl	2008-02-08 13:04:51.000000000 -0800
+++ linux-2.6.git-dave/fs/ncpfs/ioctl.c	2008-02-08 13:04:51.000000000 -0800
@@ -14,6 +14,7 @@
 #include <linux/ioctl.h>
 #include <linux/time.h>
 #include <linux/mm.h>
+#include <linux/mount.h>
 #include <linux/highuid.h>
 #include <linux/smp_lock.h>
 #include <linux/vmalloc.h>
@@ -261,7 +262,7 @@ ncp_get_charsets(struct ncp_server* serv
 }
 #endif /* CONFIG_NCPFS_NLS */
 
-int ncp_ioctl(struct inode *inode, struct file *filp,
+static int __ncp_ioctl(struct inode *inode, struct file *filp,
 	      unsigned int cmd, unsigned long arg)
 {
 	struct ncp_server *server = NCP_SERVER(inode);
@@ -822,6 +823,57 @@ outrel:			
 	return -EINVAL;
 }
 
+static int ncp_ioctl_need_write(unsigned int cmd)
+{
+	switch (cmd) {
+	case NCP_IOC_GET_FS_INFO:
+	case NCP_IOC_GET_FS_INFO_V2:
+	case NCP_IOC_NCPREQUEST:
+	case NCP_IOC_SETDENTRYTTL:
+	case NCP_IOC_SIGN_INIT:
+	case NCP_IOC_LOCKUNLOCK:
+	case NCP_IOC_SET_SIGN_WANTED:
+		return 1;
+	case NCP_IOC_GETOBJECTNAME:
+	case NCP_IOC_SETOBJECTNAME:
+	case NCP_IOC_GETPRIVATEDATA:
+	case NCP_IOC_SETPRIVATEDATA:
+	case NCP_IOC_SETCHARSETS:
+	case NCP_IOC_GETCHARSETS:
+	case NCP_IOC_CONN_LOGGED_IN:
+	case NCP_IOC_GETDENTRYTTL:
+	case NCP_IOC_GETMOUNTUID2:
+	case NCP_IOC_SIGN_WANTED:
+	case NCP_IOC_GETROOT:
+	case NCP_IOC_SETROOT:
+		return 0;
+	default:
+		/* unkown IOCTL command, assume write */
+		return 1;
+	}
+}
+
+int ncp_ioctl(struct inode *inode, struct file *filp,
+	      unsigned int cmd, unsigned long arg)
+{
+	int ret;
+
+	if (ncp_ioctl_need_write(cmd)) {
+		/*
+		 * inside the ioctl(), any failures which
+		 * are because of file_permission() are
+		 * -EACCESS, so it seems consistent to keep
+		 *  that here.
+		 */
+		if (mnt_want_write(filp->f_vfsmnt))
+			return -EACCES;
+	}
+	ret = __ncp_ioctl(inode, filp, cmd, arg);
+	if (ncp_ioctl_need_write(cmd))
+		mnt_drop_write(filp->f_vfsmnt);
+	return ret;
+}
+
 #ifdef CONFIG_COMPAT
 long ncp_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 15/30] r/o bind mounts: write counts for time functions
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (13 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 14/30] r/o bind mounts: elevate write count for ncp_ioctl() Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 16/30] r/o bind mounts: elevate write count for do_utimes() Dave Hansen
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen



Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---

 linux-2.6.git-dave/fs/inode.c |   45 ++++++++++++++++++------------------------
 1 file changed, 20 insertions(+), 25 deletions(-)

diff -puN fs/inode.c~r-o-bind-mounts-elevate-write-count-for-do_sys_utime-and-touch_atime fs/inode.c
--- linux-2.6.git/fs/inode.c~r-o-bind-mounts-elevate-write-count-for-do_sys_utime-and-touch_atime	2008-02-08 13:04:52.000000000 -0800
+++ linux-2.6.git-dave/fs/inode.c	2008-02-08 13:04:52.000000000 -0800
@@ -1199,42 +1199,37 @@ void touch_atime(struct vfsmount *mnt, s
 	struct inode *inode = dentry->d_inode;
 	struct timespec now;
 
-	if (inode->i_flags & S_NOATIME)
+	if (mnt_want_write(mnt))
 		return;
+	if (inode->i_flags & S_NOATIME)
+		goto out;
 	if (IS_NOATIME(inode))
-		return;
+		goto out;
 	if ((inode->i_sb->s_flags & MS_NODIRATIME) && S_ISDIR(inode->i_mode))
-		return;
+		goto out;
 
-	/*
-	 * We may have a NULL vfsmount when coming from NFSD
-	 */
-	if (mnt) {
-		if (mnt->mnt_flags & MNT_NOATIME)
-			return;
-		if ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode))
-			return;
-
-		if (mnt->mnt_flags & MNT_RELATIME) {
-			/*
-			 * With relative atime, only update atime if the
-			 * previous atime is earlier than either the ctime or
-			 * mtime.
-			 */
-			if (timespec_compare(&inode->i_mtime,
-						&inode->i_atime) < 0 &&
-			    timespec_compare(&inode->i_ctime,
-						&inode->i_atime) < 0)
-				return;
-		}
+	if (mnt->mnt_flags & MNT_NOATIME)
+		goto out;
+	if ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode))
+		goto out;
+	if (mnt->mnt_flags & MNT_RELATIME) {
+		/*
+		 * With relative atime, only update atime if the previous
+		 * atime is earlier than either the ctime or mtime.
+		 */
+		if (timespec_compare(&inode->i_mtime, &inode->i_atime) < 0 &&
+		    timespec_compare(&inode->i_ctime, &inode->i_atime) < 0)
+			goto out;
 	}
 
 	now = current_fs_time(inode->i_sb);
 	if (timespec_equal(&inode->i_atime, &now))
-		return;
+		goto out;
 
 	inode->i_atime = now;
 	mark_inode_dirty_sync(inode);
+out:
+	mnt_drop_write(mnt);
 }
 EXPORT_SYMBOL(touch_atime);
 
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 16/30] r/o bind mounts: elevate write count for do_utimes()
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (14 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 15/30] r/o bind mounts: write counts for time functions Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 17/30] r/o bind mounts: write count for file_update_time() Dave Hansen
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


Now includes fix for oops seen by akpm.

"never let a libc developer write your kernel code" - hch

"nor, apparently, a kernel developer" - akpm

Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 linux-2.6.git-dave/fs/utimes.c |   18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff -puN fs/utimes.c~r-o-bind-mounts-elevate-write-count-for-do_utimes fs/utimes.c
--- linux-2.6.git/fs/utimes.c~r-o-bind-mounts-elevate-write-count-for-do_utimes	2008-02-08 13:04:53.000000000 -0800
+++ linux-2.6.git-dave/fs/utimes.c	2008-02-08 13:04:53.000000000 -0800
@@ -2,6 +2,7 @@
 #include <linux/file.h>
 #include <linux/fs.h>
 #include <linux/linkage.h>
+#include <linux/mount.h>
 #include <linux/namei.h>
 #include <linux/sched.h>
 #include <linux/stat.h>
@@ -59,6 +60,7 @@ long do_utimes(int dfd, char __user *fil
 	struct inode *inode;
 	struct iattr newattrs;
 	struct file *f = NULL;
+	struct vfsmount *mnt;
 
 	error = -EINVAL;
 	if (times && (!nsec_valid(times[0].tv_nsec) ||
@@ -79,18 +81,20 @@ long do_utimes(int dfd, char __user *fil
 		if (!f)
 			goto out;
 		dentry = f->f_path.dentry;
+		mnt = f->f_path.mnt;
 	} else {
 		error = __user_walk_fd(dfd, filename, (flags & AT_SYMLINK_NOFOLLOW) ? 0 : LOOKUP_FOLLOW, &nd);
 		if (error)
 			goto out;
 
 		dentry = nd.dentry;
+		mnt = nd.mnt;
 	}
 
 	inode = dentry->d_inode;
 
-	error = -EROFS;
-	if (IS_RDONLY(inode))
+	error = mnt_want_write(mnt);
+	if (error)
 		goto dput_and_out;
 
 	/* Don't worry, the checks are done in inode_change_ok() */
@@ -98,7 +102,7 @@ long do_utimes(int dfd, char __user *fil
 	if (times) {
 		error = -EPERM;
                 if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
-                        goto dput_and_out;
+			goto mnt_drop_write_and_out;
 
 		if (times[0].tv_nsec == UTIME_OMIT)
 			newattrs.ia_valid &= ~ATTR_ATIME;
@@ -118,22 +122,24 @@ long do_utimes(int dfd, char __user *fil
 	} else {
 		error = -EACCES;
                 if (IS_IMMUTABLE(inode))
-                        goto dput_and_out;
+			goto mnt_drop_write_and_out;
 
 		if (!is_owner_or_cap(inode)) {
 			if (f) {
 				if (!(f->f_mode & FMODE_WRITE))
-					goto dput_and_out;
+					goto mnt_drop_write_and_out;
 			} else {
 				error = vfs_permission(&nd, MAY_WRITE);
 				if (error)
-					goto dput_and_out;
+					goto mnt_drop_write_and_out;
 			}
 		}
 	}
 	mutex_lock(&inode->i_mutex);
 	error = notify_change(dentry, &newattrs);
 	mutex_unlock(&inode->i_mutex);
+mnt_drop_write_and_out:
+	mnt_drop_write(mnt);
 dput_and_out:
 	if (f)
 		fput(f);
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 17/30] r/o bind mounts: write count for file_update_time()
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (15 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 16/30] r/o bind mounts: elevate write count for do_utimes() Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 18/30] r/o bind mounts: write counts for link/symlink Dave Hansen
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen



Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---

 linux-2.6.git-dave/fs/inode.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff -puN fs/inode.c~r-o-bind-mounts-elevate-write-count-for-file_update_time fs/inode.c
--- linux-2.6.git/fs/inode.c~r-o-bind-mounts-elevate-write-count-for-file_update_time	2008-02-08 13:04:53.000000000 -0800
+++ linux-2.6.git-dave/fs/inode.c	2008-02-08 13:04:53.000000000 -0800
@@ -1250,10 +1250,13 @@ void file_update_time(struct file *file)
 	struct inode *inode = file->f_path.dentry->d_inode;
 	struct timespec now;
 	int sync_it = 0;
+	int err;
 
 	if (IS_NOCMTIME(inode))
 		return;
-	if (IS_RDONLY(inode))
+
+	err = mnt_want_write(file->f_vfsmnt);
+	if (err)
 		return;
 
 	now = current_fs_time(inode->i_sb);
@@ -1274,6 +1277,7 @@ void file_update_time(struct file *file)
 
 	if (sync_it)
 		mark_inode_dirty_sync(inode);
+	mnt_drop_write(file->f_vfsmnt);
 }
 
 EXPORT_SYMBOL(file_update_time);
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 18/30] r/o bind mounts: write counts for link/symlink
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (16 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 17/30] r/o bind mounts: write count for file_update_time() Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 19/30] r/o bind mounts: elevate write count for ioctls() Dave Hansen
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen



Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---

 linux-2.6.git-dave/fs/namei.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff -puN fs/namei.c~r-o-bind-mounts-elevate-write-count-for-link-and-symlink-calls fs/namei.c
--- linux-2.6.git/fs/namei.c~r-o-bind-mounts-elevate-write-count-for-link-and-symlink-calls	2008-02-08 13:04:54.000000000 -0800
+++ linux-2.6.git-dave/fs/namei.c	2008-02-08 13:04:54.000000000 -0800
@@ -2363,7 +2363,12 @@ asmlinkage long sys_symlinkat(const char
 	if (IS_ERR(dentry))
 		goto out_unlock;
 
+	error = mnt_want_write(nd.mnt);
+	if (error)
+		goto out_dput;
 	error = vfs_symlink(nd.dentry->d_inode, dentry, from, S_IALLUGO);
+	mnt_drop_write(nd.mnt);
+out_dput:
 	dput(dentry);
 out_unlock:
 	mutex_unlock(&nd.dentry->d_inode->i_mutex);
@@ -2458,7 +2463,12 @@ asmlinkage long sys_linkat(int olddfd, c
 	error = PTR_ERR(new_dentry);
 	if (IS_ERR(new_dentry))
 		goto out_unlock;
+	error = mnt_want_write(nd.mnt);
+	if (error)
+		goto out_dput;
 	error = vfs_link(old_nd.dentry, nd.dentry->d_inode, new_dentry);
+	mnt_drop_write(nd.mnt);
+out_dput:
 	dput(new_dentry);
 out_unlock:
 	mutex_unlock(&nd.dentry->d_inode->i_mutex);
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 19/30] r/o bind mounts: elevate write count for ioctls()
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (17 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 18/30] r/o bind mounts: write counts for link/symlink Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 20/30] r/o bind mounts: elevate write count for open()s Dave Hansen
                   ` (13 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


Some ioctl()s can cause writes to the filesystem.  Take these, and make them
use mnt_want/drop_write() instead.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 linux-2.6.git-dave/fs/ext2/ioctl.c              |   49 +++++++----
 linux-2.6.git-dave/fs/ext3/ioctl.c              |  103 +++++++++++++++---------
 linux-2.6.git-dave/fs/fat/file.c                |   12 +-
 linux-2.6.git-dave/fs/hfsplus/ioctl.c           |   40 +++++----
 linux-2.6.git-dave/fs/jfs/ioctl.c               |   35 +++++---
 linux-2.6.git-dave/fs/ocfs2/ioctl.c             |   11 +-
 linux-2.6.git-dave/fs/reiserfs/ioctl.c          |   65 +++++++++------
 linux-2.6.git-dave/fs/xfs/linux-2.6/xfs_ioctl.c |   15 ++-
 8 files changed, 213 insertions(+), 117 deletions(-)

diff -puN fs/ext2/ioctl.c~r-o-bind-mounts-elevate-write-count-for-some-ioctls fs/ext2/ioctl.c
--- linux-2.6.git/fs/ext2/ioctl.c~r-o-bind-mounts-elevate-write-count-for-some-ioctls	2008-02-08 13:04:54.000000000 -0800
+++ linux-2.6.git-dave/fs/ext2/ioctl.c	2008-02-08 13:04:54.000000000 -0800
@@ -12,6 +12,7 @@
 #include <linux/time.h>
 #include <linux/sched.h>
 #include <linux/compat.h>
+#include <linux/mount.h>
 #include <linux/smp_lock.h>
 #include <asm/current.h>
 #include <asm/uaccess.h>
@@ -23,6 +24,7 @@ long ext2_ioctl(struct file *filp, unsig
 	struct ext2_inode_info *ei = EXT2_I(inode);
 	unsigned int flags;
 	unsigned short rsv_window_size;
+	int ret;
 
 	ext2_debug ("cmd = %u, arg = %lu\n", cmd, arg);
 
@@ -34,14 +36,19 @@ long ext2_ioctl(struct file *filp, unsig
 	case EXT2_IOC_SETFLAGS: {
 		unsigned int oldflags;
 
-		if (IS_RDONLY(inode))
-			return -EROFS;
-
-		if (!is_owner_or_cap(inode))
-			return -EACCES;
+		ret = mnt_want_write(filp->f_vfsmnt);
+		if (ret)
+			return ret;
+
+		if (!is_owner_or_cap(inode)) {
+			ret = -EACCES;
+			goto setflags_out;
+		}
 
-		if (get_user(flags, (int __user *) arg))
-			return -EFAULT;
+		if (get_user(flags, (int __user *) arg)) {
+			ret = -EFAULT;
+			goto setflags_out;
+		}
 
 		if (!S_ISDIR(inode->i_mode))
 			flags &= ~EXT2_DIRSYNC_FL;
@@ -50,7 +57,8 @@ long ext2_ioctl(struct file *filp, unsig
 		/* Is it quota file? Do not allow user to mess with it */
 		if (IS_NOQUOTA(inode)) {
 			mutex_unlock(&inode->i_mutex);
-			return -EPERM;
+			ret = -EPERM;
+			goto setflags_out;
 		}
 		oldflags = ei->i_flags;
 
@@ -63,7 +71,8 @@ long ext2_ioctl(struct file *filp, unsig
 		if ((flags ^ oldflags) & (EXT2_APPEND_FL | EXT2_IMMUTABLE_FL)) {
 			if (!capable(CAP_LINUX_IMMUTABLE)) {
 				mutex_unlock(&inode->i_mutex);
-				return -EPERM;
+				ret = -EPERM;
+				goto setflags_out;
 			}
 		}
 
@@ -75,20 +84,26 @@ long ext2_ioctl(struct file *filp, unsig
 		ext2_set_inode_flags(inode);
 		inode->i_ctime = CURRENT_TIME_SEC;
 		mark_inode_dirty(inode);
-		return 0;
+setflags_out:
+		mnt_drop_write(filp->f_vfsmnt);
+		return ret;
 	}
 	case EXT2_IOC_GETVERSION:
 		return put_user(inode->i_generation, (int __user *) arg);
 	case EXT2_IOC_SETVERSION:
 		if (!is_owner_or_cap(inode))
 			return -EPERM;
-		if (IS_RDONLY(inode))
-			return -EROFS;
-		if (get_user(inode->i_generation, (int __user *) arg))
-			return -EFAULT;	
-		inode->i_ctime = CURRENT_TIME_SEC;
-		mark_inode_dirty(inode);
-		return 0;
+		ret = mnt_want_write(filp->f_vfsmnt);
+		if (ret)
+			return ret;
+		if (get_user(inode->i_generation, (int __user *) arg)) {
+			ret = -EFAULT;
+		} else {
+			inode->i_ctime = CURRENT_TIME_SEC;
+			mark_inode_dirty(inode);
+		}
+		mnt_drop_write(filp->f_vfsmnt);
+		return ret;
 	case EXT2_IOC_GETRSVSZ:
 		if (test_opt(inode->i_sb, RESERVATION)
 			&& S_ISREG(inode->i_mode)
diff -puN fs/ext3/ioctl.c~r-o-bind-mounts-elevate-write-count-for-some-ioctls fs/ext3/ioctl.c
--- linux-2.6.git/fs/ext3/ioctl.c~r-o-bind-mounts-elevate-write-count-for-some-ioctls	2008-02-08 13:04:54.000000000 -0800
+++ linux-2.6.git-dave/fs/ext3/ioctl.c	2008-02-08 13:04:54.000000000 -0800
@@ -12,6 +12,7 @@
 #include <linux/capability.h>
 #include <linux/ext3_fs.h>
 #include <linux/ext3_jbd.h>
+#include <linux/mount.h>
 #include <linux/time.h>
 #include <linux/compat.h>
 #include <linux/smp_lock.h>
@@ -38,14 +39,19 @@ int ext3_ioctl (struct inode * inode, st
 		unsigned int oldflags;
 		unsigned int jflag;
 
-		if (IS_RDONLY(inode))
-			return -EROFS;
+		err = mnt_want_write(filp->f_vfsmnt);
+		if (err)
+			return err;
 
-		if (!is_owner_or_cap(inode))
-			return -EACCES;
+		if (!is_owner_or_cap(inode)) {
+			err = -EACCES;
+			goto flags_out;
+		}
 
-		if (get_user(flags, (int __user *) arg))
-			return -EFAULT;
+		if (get_user(flags, (int __user *) arg)) {
+			err = -EFAULT;
+			goto flags_out;
+		}
 
 		if (!S_ISDIR(inode->i_mode))
 			flags &= ~EXT3_DIRSYNC_FL;
@@ -54,7 +60,8 @@ int ext3_ioctl (struct inode * inode, st
 		/* Is it quota file? Do not allow user to mess with it */
 		if (IS_NOQUOTA(inode)) {
 			mutex_unlock(&inode->i_mutex);
-			return -EPERM;
+			err = -EPERM;
+			goto flags_out;
 		}
 		oldflags = ei->i_flags;
 
@@ -70,7 +77,8 @@ int ext3_ioctl (struct inode * inode, st
 		if ((flags ^ oldflags) & (EXT3_APPEND_FL | EXT3_IMMUTABLE_FL)) {
 			if (!capable(CAP_LINUX_IMMUTABLE)) {
 				mutex_unlock(&inode->i_mutex);
-				return -EPERM;
+				err = -EPERM;
+				goto flags_out;
 			}
 		}
 
@@ -81,7 +89,8 @@ int ext3_ioctl (struct inode * inode, st
 		if ((jflag ^ oldflags) & (EXT3_JOURNAL_DATA_FL)) {
 			if (!capable(CAP_SYS_RESOURCE)) {
 				mutex_unlock(&inode->i_mutex);
-				return -EPERM;
+				err = -EPERM;
+				goto flags_out;
 			}
 		}
 
@@ -89,7 +98,8 @@ int ext3_ioctl (struct inode * inode, st
 		handle = ext3_journal_start(inode, 1);
 		if (IS_ERR(handle)) {
 			mutex_unlock(&inode->i_mutex);
-			return PTR_ERR(handle);
+			err = PTR_ERR(handle);
+			goto flags_out;
 		}
 		if (IS_SYNC(inode))
 			handle->h_sync = 1;
@@ -115,6 +125,8 @@ flags_err:
 		if ((jflag ^ oldflags) & (EXT3_JOURNAL_DATA_FL))
 			err = ext3_change_inode_journal_flag(inode, jflag);
 		mutex_unlock(&inode->i_mutex);
+flags_out:
+		mnt_drop_write(filp->f_vfsmnt);
 		return err;
 	}
 	case EXT3_IOC_GETVERSION:
@@ -129,14 +141,18 @@ flags_err:
 
 		if (!is_owner_or_cap(inode))
 			return -EPERM;
-		if (IS_RDONLY(inode))
-			return -EROFS;
-		if (get_user(generation, (int __user *) arg))
-			return -EFAULT;
-
+		err = mnt_want_write(filp->f_vfsmnt);
+		if (err)
+			return err;
+		if (get_user(generation, (int __user *) arg)) {
+			err = -EFAULT;
+			goto setversion_out;
+		}
 		handle = ext3_journal_start(inode, 1);
-		if (IS_ERR(handle))
-			return PTR_ERR(handle);
+		if (IS_ERR(handle)) {
+			err = PTR_ERR(handle);
+			goto setversion_out;
+		}
 		err = ext3_reserve_inode_write(handle, inode, &iloc);
 		if (err == 0) {
 			inode->i_ctime = CURRENT_TIME_SEC;
@@ -144,6 +160,8 @@ flags_err:
 			err = ext3_mark_iloc_dirty(handle, inode, &iloc);
 		}
 		ext3_journal_stop(handle);
+setversion_out:
+		mnt_drop_write(filp->f_vfsmnt);
 		return err;
 	}
 #ifdef CONFIG_JBD_DEBUG
@@ -179,18 +197,24 @@ flags_err:
 		}
 		return -ENOTTY;
 	case EXT3_IOC_SETRSVSZ: {
+		int err;
 
 		if (!test_opt(inode->i_sb, RESERVATION) ||!S_ISREG(inode->i_mode))
 			return -ENOTTY;
 
-		if (IS_RDONLY(inode))
-			return -EROFS;
+		err = mnt_want_write(filp->f_vfsmnt);
+		if (err)
+			return err;
 
-		if (!is_owner_or_cap(inode))
-			return -EACCES;
+		if (!is_owner_or_cap(inode)) {
+			err = -EACCES;
+			goto setrsvsz_out;
+		}
 
-		if (get_user(rsv_window_size, (int __user *)arg))
-			return -EFAULT;
+		if (get_user(rsv_window_size, (int __user *)arg)) {
+			err = -EFAULT;
+			goto setrsvsz_out;
+		}
 
 		if (rsv_window_size > EXT3_MAX_RESERVE_BLOCKS)
 			rsv_window_size = EXT3_MAX_RESERVE_BLOCKS;
@@ -208,7 +232,9 @@ flags_err:
 			rsv->rsv_goal_size = rsv_window_size;
 		}
 		mutex_unlock(&ei->truncate_mutex);
-		return 0;
+setrsvsz_out:
+		mnt_drop_write(filp->f_vfsmnt);
+		return err;
 	}
 	case EXT3_IOC_GROUP_EXTEND: {
 		ext3_fsblk_t n_blocks_count;
@@ -218,17 +244,20 @@ flags_err:
 		if (!capable(CAP_SYS_RESOURCE))
 			return -EPERM;
 
-		if (IS_RDONLY(inode))
-			return -EROFS;
-
-		if (get_user(n_blocks_count, (__u32 __user *)arg))
-			return -EFAULT;
+		err = mnt_want_write(filp->f_vfsmnt);
+		if (err)
+			return err;
 
+		if (get_user(n_blocks_count, (__u32 __user *)arg)) {
+			err = -EFAULT;
+			goto group_extend_out;
+		}
 		err = ext3_group_extend(sb, EXT3_SB(sb)->s_es, n_blocks_count);
 		journal_lock_updates(EXT3_SB(sb)->s_journal);
 		journal_flush(EXT3_SB(sb)->s_journal);
 		journal_unlock_updates(EXT3_SB(sb)->s_journal);
-
+group_extend_out:
+		mnt_drop_write(filp->f_vfsmnt);
 		return err;
 	}
 	case EXT3_IOC_GROUP_ADD: {
@@ -239,18 +268,22 @@ flags_err:
 		if (!capable(CAP_SYS_RESOURCE))
 			return -EPERM;
 
-		if (IS_RDONLY(inode))
-			return -EROFS;
+		err = mnt_want_write(filp->f_vfsmnt);
+		if (err)
+			return err;
 
 		if (copy_from_user(&input, (struct ext3_new_group_input __user *)arg,
-				sizeof(input)))
-			return -EFAULT;
+				sizeof(input))) {
+			err = -EFAULT;
+			goto group_add_out;
+		}
 
 		err = ext3_group_add(sb, &input);
 		journal_lock_updates(EXT3_SB(sb)->s_journal);
 		journal_flush(EXT3_SB(sb)->s_journal);
 		journal_unlock_updates(EXT3_SB(sb)->s_journal);
-
+group_add_out:
+		mnt_drop_write(filp->f_vfsmnt);
 		return err;
 	}
 
diff -puN fs/fat/file.c~r-o-bind-mounts-elevate-write-count-for-some-ioctls fs/fat/file.c
--- linux-2.6.git/fs/fat/file.c~r-o-bind-mounts-elevate-write-count-for-some-ioctls	2008-02-08 13:04:54.000000000 -0800
+++ linux-2.6.git-dave/fs/fat/file.c	2008-02-08 13:04:54.000000000 -0800
@@ -8,6 +8,7 @@
 
 #include <linux/capability.h>
 #include <linux/module.h>
+#include <linux/mount.h>
 #include <linux/time.h>
 #include <linux/msdos_fs.h>
 #include <linux/smp_lock.h>
@@ -46,10 +47,9 @@ int fat_generic_ioctl(struct inode *inod
 
 		mutex_lock(&inode->i_mutex);
 
-		if (IS_RDONLY(inode)) {
-			err = -EROFS;
-			goto up;
-		}
+		err = mnt_want_write(filp->f_vfsmnt);
+		if (err)
+			goto up_no_drop_write;
 
 		/*
 		 * ATTR_VOLUME and ATTR_DIR cannot be changed; this also
@@ -105,7 +105,9 @@ int fat_generic_ioctl(struct inode *inod
 
 		MSDOS_I(inode)->i_attrs = attr & ATTR_UNUSED;
 		mark_inode_dirty(inode);
-	up:
+up:
+		mnt_drop_write(filp->f_vfsmnt);
+up_no_drop_write:
 		mutex_unlock(&inode->i_mutex);
 		return err;
 	}
diff -puN fs/hfsplus/ioctl.c~r-o-bind-mounts-elevate-write-count-for-some-ioctls fs/hfsplus/ioctl.c
--- linux-2.6.git/fs/hfsplus/ioctl.c~r-o-bind-mounts-elevate-write-count-for-some-ioctls	2008-02-08 13:04:54.000000000 -0800
+++ linux-2.6.git-dave/fs/hfsplus/ioctl.c	2008-02-08 13:04:54.000000000 -0800
@@ -14,6 +14,7 @@
 
 #include <linux/capability.h>
 #include <linux/fs.h>
+#include <linux/mount.h>
 #include <linux/sched.h>
 #include <linux/xattr.h>
 #include <asm/uaccess.h>
@@ -35,25 +36,32 @@ int hfsplus_ioctl(struct inode *inode, s
 			flags |= FS_NODUMP_FL; /* EXT2_NODUMP_FL */
 		return put_user(flags, (int __user *)arg);
 	case HFSPLUS_IOC_EXT2_SETFLAGS: {
-		if (IS_RDONLY(inode))
-			return -EROFS;
-
-		if (!is_owner_or_cap(inode))
-			return -EACCES;
-
-		if (get_user(flags, (int __user *)arg))
-			return -EFAULT;
-
+		int err = 0;
+		err = mnt_want_write(filp->f_vfsmnt);
+		if (err)
+			return err;
+
+		if (!is_owner_or_cap(inode)) {
+			err = -EACCES;
+			goto setflags_out;
+		}
+		if (get_user(flags, (int __user *)arg)) {
+			err = -EFAULT;
+			goto setflags_out;
+		}
 		if (flags & (FS_IMMUTABLE_FL|FS_APPEND_FL) ||
 		    HFSPLUS_I(inode).rootflags & (HFSPLUS_FLG_IMMUTABLE|HFSPLUS_FLG_APPEND)) {
-			if (!capable(CAP_LINUX_IMMUTABLE))
-				return -EPERM;
+			if (!capable(CAP_LINUX_IMMUTABLE)) {
+				err = -EPERM;
+				goto setflags_out;
+			}
 		}
 
 		/* don't silently ignore unsupported ext2 flags */
-		if (flags & ~(FS_IMMUTABLE_FL|FS_APPEND_FL|FS_NODUMP_FL))
-			return -EOPNOTSUPP;
-
+		if (flags & ~(FS_IMMUTABLE_FL|FS_APPEND_FL|FS_NODUMP_FL)) {
+			err = -EOPNOTSUPP;
+			goto setflags_out;
+		}
 		if (flags & FS_IMMUTABLE_FL) { /* EXT2_IMMUTABLE_FL */
 			inode->i_flags |= S_IMMUTABLE;
 			HFSPLUS_I(inode).rootflags |= HFSPLUS_FLG_IMMUTABLE;
@@ -75,7 +83,9 @@ int hfsplus_ioctl(struct inode *inode, s
 
 		inode->i_ctime = CURRENT_TIME_SEC;
 		mark_inode_dirty(inode);
-		return 0;
+setflags_out:
+		mnt_drop_write(filp->f_vfsmnt);
+		return err;
 	}
 	default:
 		return -ENOTTY;
diff -puN fs/jfs/ioctl.c~r-o-bind-mounts-elevate-write-count-for-some-ioctls fs/jfs/ioctl.c
--- linux-2.6.git/fs/jfs/ioctl.c~r-o-bind-mounts-elevate-write-count-for-some-ioctls	2008-02-08 13:04:54.000000000 -0800
+++ linux-2.6.git-dave/fs/jfs/ioctl.c	2008-02-08 13:04:54.000000000 -0800
@@ -8,6 +8,7 @@
 #include <linux/fs.h>
 #include <linux/ctype.h>
 #include <linux/capability.h>
+#include <linux/mount.h>
 #include <linux/time.h>
 #include <linux/sched.h>
 #include <asm/current.h>
@@ -65,23 +66,30 @@ long jfs_ioctl(struct file *filp, unsign
 		return put_user(flags, (int __user *) arg);
 	case JFS_IOC_SETFLAGS: {
 		unsigned int oldflags;
+		int err;
 
-		if (IS_RDONLY(inode))
-			return -EROFS;
-
-		if (!is_owner_or_cap(inode))
-			return -EACCES;
-
-		if (get_user(flags, (int __user *) arg))
-			return -EFAULT;
+		err = mnt_want_write(filp->f_vfsmnt);
+		if (err)
+			return err;
+
+		if (!is_owner_or_cap(inode)) {
+			err = -EACCES;
+			goto setflags_out;
+		}
+		if (get_user(flags, (int __user *) arg)) {
+			err = -EFAULT;
+			goto setflags_out;
+		}
 
 		flags = jfs_map_ext2(flags, 1);
 		if (!S_ISDIR(inode->i_mode))
 			flags &= ~JFS_DIRSYNC_FL;
 
 		/* Is it quota file? Do not allow user to mess with it */
-		if (IS_NOQUOTA(inode))
-			return -EPERM;
+		if (IS_NOQUOTA(inode)) {
+			err = -EPERM;
+			goto setflags_out;
+		}
 
 		/* Lock against other parallel changes of flags */
 		mutex_lock(&inode->i_mutex);
@@ -98,7 +106,8 @@ long jfs_ioctl(struct file *filp, unsign
 			(JFS_APPEND_FL | JFS_IMMUTABLE_FL))) {
 			if (!capable(CAP_LINUX_IMMUTABLE)) {
 				mutex_unlock(&inode->i_mutex);
-				return -EPERM;
+				err = -EPERM;
+				goto setflags_out;
 			}
 		}
 
@@ -110,7 +119,9 @@ long jfs_ioctl(struct file *filp, unsign
 		mutex_unlock(&inode->i_mutex);
 		inode->i_ctime = CURRENT_TIME_SEC;
 		mark_inode_dirty(inode);
-		return 0;
+setflags_out:
+		mnt_drop_write(filp->f_vfsmnt);
+		return err;
 	}
 	default:
 		return -ENOTTY;
diff -puN fs/ocfs2/ioctl.c~r-o-bind-mounts-elevate-write-count-for-some-ioctls fs/ocfs2/ioctl.c
--- linux-2.6.git/fs/ocfs2/ioctl.c~r-o-bind-mounts-elevate-write-count-for-some-ioctls	2008-02-08 13:04:54.000000000 -0800
+++ linux-2.6.git-dave/fs/ocfs2/ioctl.c	2008-02-08 13:04:54.000000000 -0800
@@ -59,10 +59,6 @@ static int ocfs2_set_inode_attr(struct i
 		goto bail;
 	}
 
-	status = -EROFS;
-	if (IS_RDONLY(inode))
-		goto bail_unlock;
-
 	status = -EACCES;
 	if (!is_owner_or_cap(inode))
 		goto bail_unlock;
@@ -133,8 +129,13 @@ int ocfs2_ioctl(struct inode * inode, st
 		if (get_user(flags, (int __user *) arg))
 			return -EFAULT;
 
-		return ocfs2_set_inode_attr(inode, flags,
+		status = mnt_want_write(filp->f_vfsmnt);
+		if (status)
+			return status;
+		status = ocfs2_set_inode_attr(inode, flags,
 			OCFS2_FL_MODIFIABLE);
+		mnt_drop_write(filp->f_vfsmnt);
+		return status;
 	case OCFS2_IOC_RESVSP:
 	case OCFS2_IOC_RESVSP64:
 	case OCFS2_IOC_UNRESVSP:
diff -puN fs/reiserfs/ioctl.c~r-o-bind-mounts-elevate-write-count-for-some-ioctls fs/reiserfs/ioctl.c
--- linux-2.6.git/fs/reiserfs/ioctl.c~r-o-bind-mounts-elevate-write-count-for-some-ioctls	2008-02-08 13:04:54.000000000 -0800
+++ linux-2.6.git-dave/fs/reiserfs/ioctl.c	2008-02-08 13:04:54.000000000 -0800
@@ -4,6 +4,7 @@
 
 #include <linux/capability.h>
 #include <linux/fs.h>
+#include <linux/mount.h>
 #include <linux/reiserfs_fs.h>
 #include <linux/time.h>
 #include <asm/uaccess.h>
@@ -25,6 +26,7 @@ int reiserfs_ioctl(struct inode *inode, 
 		   unsigned long arg)
 {
 	unsigned int flags;
+	int err = 0;
 
 	switch (cmd) {
 	case REISERFS_IOC_UNPACK:
@@ -48,50 +50,67 @@ int reiserfs_ioctl(struct inode *inode, 
 			if (!reiserfs_attrs(inode->i_sb))
 				return -ENOTTY;
 
-			if (IS_RDONLY(inode))
-				return -EROFS;
-
-			if (!is_owner_or_cap(inode))
-				return -EPERM;
-
-			if (get_user(flags, (int __user *)arg))
-				return -EFAULT;
-
-			/* Is it quota file? Do not allow user to mess with it. */
-			if (IS_NOQUOTA(inode))
-				return -EPERM;
+			err = mnt_want_write(filp->f_vfsmnt);
+			if (err)
+				return err;
+
+			if (!is_owner_or_cap(inode)) {
+				err = -EPERM;
+				goto setflags_out;
+			}
+			if (get_user(flags, (int __user *)arg)) {
+				err = -EFAULT;
+				goto setflags_out;
+			}
+			/*
+			 * Is it quota file? Do not allow user to mess with it
+			 */
+			if (IS_NOQUOTA(inode)) {
+				err = -EPERM;
+				goto setflags_out;
+			}
 			if (((flags ^ REISERFS_I(inode)->
 			      i_attrs) & (REISERFS_IMMUTABLE_FL |
 					  REISERFS_APPEND_FL))
-			    && !capable(CAP_LINUX_IMMUTABLE))
-				return -EPERM;
-
+			    && !capable(CAP_LINUX_IMMUTABLE)) {
+				err = -EPERM;
+				goto setflags_out;
+			}
 			if ((flags & REISERFS_NOTAIL_FL) &&
 			    S_ISREG(inode->i_mode)) {
 				int result;
 
 				result = reiserfs_unpack(inode, filp);
-				if (result)
-					return result;
+				if (result) {
+					err = result;
+					goto setflags_out;
+				}
 			}
 			sd_attrs_to_i_attrs(flags, inode);
 			REISERFS_I(inode)->i_attrs = flags;
 			inode->i_ctime = CURRENT_TIME_SEC;
 			mark_inode_dirty(inode);
-			return 0;
+setflags_out:
+			mnt_drop_write(filp->f_vfsmnt);
+			return err;
 		}
 	case REISERFS_IOC_GETVERSION:
 		return put_user(inode->i_generation, (int __user *)arg);
 	case REISERFS_IOC_SETVERSION:
 		if (!is_owner_or_cap(inode))
 			return -EPERM;
-		if (IS_RDONLY(inode))
-			return -EROFS;
-		if (get_user(inode->i_generation, (int __user *)arg))
-			return -EFAULT;
+		err = mnt_want_write(filp->f_vfsmnt);
+		if (err)
+			return err;
+		if (get_user(inode->i_generation, (int __user *)arg)) {
+			err = -EFAULT;
+			goto setversion_out;
+		}
 		inode->i_ctime = CURRENT_TIME_SEC;
 		mark_inode_dirty(inode);
-		return 0;
+setversion_out:
+		mnt_drop_write(filp->f_vfsmnt);
+		return err;
 	default:
 		return -ENOTTY;
 	}
diff -puN fs/xfs/linux-2.6/xfs_ioctl.c~r-o-bind-mounts-elevate-write-count-for-some-ioctls fs/xfs/linux-2.6/xfs_ioctl.c
--- linux-2.6.git/fs/xfs/linux-2.6/xfs_ioctl.c~r-o-bind-mounts-elevate-write-count-for-some-ioctls	2008-02-08 13:04:54.000000000 -0800
+++ linux-2.6.git-dave/fs/xfs/linux-2.6/xfs_ioctl.c	2008-02-08 13:04:54.000000000 -0800
@@ -535,8 +535,6 @@ xfs_attrmulti_attr_set(
 	char			*kbuf;
 	int			error = EFAULT;
 
-	if (IS_RDONLY(inode))
-		return -EROFS;
 	if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
 		return EPERM;
 	if (len > XATTR_SIZE_MAX)
@@ -562,8 +560,6 @@ xfs_attrmulti_attr_remove(
 	char			*name,
 	__uint32_t		flags)
 {
-	if (IS_RDONLY(inode))
-		return -EROFS;
 	if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
 		return EPERM;
 	return xfs_attr_remove(XFS_I(inode), name, flags);
@@ -573,6 +569,7 @@ STATIC int
 xfs_attrmulti_by_handle(
 	xfs_mount_t		*mp,
 	void			__user *arg,
+	struct file		*parfilp,
 	struct inode		*parinode)
 {
 	int			error;
@@ -626,13 +623,21 @@ xfs_attrmulti_by_handle(
 					&ops[i].am_length, ops[i].am_flags);
 			break;
 		case ATTR_OP_SET:
+			ops[i].am_error = mnt_want_write(parfilp->f_vfsmnt);
+			if (ops[i].am_error)
+				break;
 			ops[i].am_error = xfs_attrmulti_attr_set(inode,
 					attr_name, ops[i].am_attrvalue,
 					ops[i].am_length, ops[i].am_flags);
+			mnt_drop_write(parfilp->f_vfsmnt);
 			break;
 		case ATTR_OP_REMOVE:
+			ops[i].am_error = mnt_want_write(parfilp->f_vfsmnt);
+			if (ops[i].am_error)
+				break;
 			ops[i].am_error = xfs_attrmulti_attr_remove(inode,
 					attr_name, ops[i].am_flags);
+			mnt_drop_write(parfilp->f_vfsmnt);
 			break;
 		default:
 			ops[i].am_error = EINVAL;
@@ -811,7 +816,7 @@ xfs_ioctl(
 		return xfs_attrlist_by_handle(mp, arg, inode);
 
 	case XFS_IOC_ATTRMULTI_BY_HANDLE:
-		return xfs_attrmulti_by_handle(mp, arg, inode);
+		return xfs_attrmulti_by_handle(mp, arg, filp, inode);
 
 	case XFS_IOC_SWAPEXT: {
 		error = xfs_swapext((struct xfs_swapext __user *)arg);
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 20/30] r/o bind mounts: elevate write count for open()s
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (18 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 19/30] r/o bind mounts: elevate write count for ioctls() Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 21/30] r/o bind mounts: get write access for vfs_rename() callers Dave Hansen
                   ` (12 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


This is the first really tricky patch in the series.  It elevates the writer
count on a mount each time a non-special file is opened for write.

We used to do this in may_open(), but Miklos pointed out that __dentry_open()
is used as well to create filps.  This will cover even those cases, while a
call in may_open() would not have.

There is also an elevated count around the vfs_create() call in open_namei(). 
See the comments for more details, but we need this to fix a 'create, remount,
fail r/w open()' race.

Some filesystems forego the use of normal vfs calls to create
struct files.   Make sure that these users elevate the mnt
writer count because they will get __fput(), and we need
to make sure they're balanced.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 linux-2.6.git-dave/fs/file_table.c |   14 ++++++
 linux-2.6.git-dave/fs/namei.c      |   75 ++++++++++++++++++++++++++++++++-----
 linux-2.6.git-dave/fs/open.c       |   36 ++++++++++++++++-
 linux-2.6.git-dave/ipc/mqueue.c    |   16 ++++++-
 4 files changed, 127 insertions(+), 14 deletions(-)

diff -puN fs/file_table.c~r-o-bind-mounts-elevate-write-count-opened-files fs/file_table.c
--- linux-2.6.git/fs/file_table.c~r-o-bind-mounts-elevate-write-count-opened-files	2008-02-08 13:04:55.000000000 -0800
+++ linux-2.6.git-dave/fs/file_table.c	2008-02-08 13:04:55.000000000 -0800
@@ -199,6 +199,17 @@ int init_file(struct file *file, struct 
 	file->f_mapping = dentry->d_inode->i_mapping;
 	file->f_mode = mode;
 	file->f_op = fop;
+
+	/*
+	 * These mounts don't really matter in practice
+	 * for r/o bind mounts.  They aren't userspace-
+	 * visible.  We do this for consistency, and so
+	 * that we can do debugging checks at __fput()
+	 */
+	if ((mode & FMODE_WRITE) && !special_file(dentry->d_inode->i_mode)) {
+		error = mnt_want_write(mnt);
+		WARN_ON(error);
+	}
 	return error;
 }
 EXPORT_SYMBOL(init_file);
@@ -221,10 +232,13 @@ EXPORT_SYMBOL(fput);
  */
 void drop_file_write_access(struct file *file)
 {
+	struct vfsmount *mnt = file->f_path.mnt;
 	struct dentry *dentry = file->f_path.dentry;
 	struct inode *inode = dentry->d_inode;
 
 	put_write_access(inode);
+	if (!special_file(inode->i_mode))
+		mnt_drop_write(mnt);
 }
 EXPORT_SYMBOL_GPL(drop_file_write_access);
 
diff -puN fs/namei.c~r-o-bind-mounts-elevate-write-count-opened-files fs/namei.c
--- linux-2.6.git/fs/namei.c~r-o-bind-mounts-elevate-write-count-opened-files	2008-02-08 13:04:55.000000000 -0800
+++ linux-2.6.git-dave/fs/namei.c	2008-02-08 13:04:55.000000000 -0800
@@ -1620,8 +1620,7 @@ int may_open(struct nameidata *nd, int a
 			return -EACCES;
 
 		flag &= ~O_TRUNC;
-	} else if (IS_RDONLY(inode) && (acc_mode & MAY_WRITE))
-		return -EROFS;
+	}
 
 	error = vfs_permission(nd, acc_mode);
 	if (error)
@@ -1721,18 +1720,32 @@ static inline int open_to_namei_flags(in
 	return flag;
 }
 
+static int open_will_write_to_fs(int flag, struct inode *inode)
+{
+	/*
+	 * We'll never write to the fs underlying
+	 * a device file.
+	 */
+	if (special_file(inode->i_mode))
+		return 0;
+	return (flag & O_TRUNC);
+}
+
 /*
- * Note that the low bits of "flag" aren't the same as in the open
- * system call.  See open_to_namei_flags().
+ * Note that the low bits of the passed in "open_flag"
+ * are not the same as in the local variable "flag". See
+ * open_to_namei_flags() for more details.
  */
 struct file *do_filp_open(int dfd, const char *pathname,
 		int open_flag, int mode)
 {
+	struct file *filp;
 	struct nameidata nd;
 	int acc_mode, error;
 	struct path path;
 	struct dentry *dir;
 	int count = 0;
+	int will_write;
 	int flag = open_to_namei_flags(open_flag);
 
 	acc_mode = ACC_MODE(flag);
@@ -1788,17 +1801,30 @@ do_last:
 	}
 
 	if (IS_ERR(nd.intent.open.file)) {
-		mutex_unlock(&dir->d_inode->i_mutex);
 		error = PTR_ERR(nd.intent.open.file);
-		goto exit_dput;
+		goto exit_mutex_unlock;
 	}
 
 	/* Negative dentry, just create the file */
 	if (!path.dentry->d_inode) {
-		error = __open_namei_create(&nd, &path, flag, mode);
+		/*
+		 * This write is needed to ensure that a
+		 * ro->rw transition does not occur between
+		 * the time when the file is created and when
+		 * a permanent write count is taken through
+		 * the 'struct file' in nameidata_to_filp().
+		 */
+		error = mnt_want_write(nd.mnt);
 		if (error)
+			goto exit_mutex_unlock;
+		error = __open_namei_create(&nd, &path, flag, mode);
+		if (error) {
+			mnt_drop_write(nd.mnt);
 			goto exit;
-		return nameidata_to_filp(&nd, open_flag);
+		}
+		filp = nameidata_to_filp(&nd, open_flag);
+		mnt_drop_write(nd.mnt);
+		return filp;
 	}
 
 	/*
@@ -1828,11 +1854,40 @@ do_last:
 	if (path.dentry->d_inode && S_ISDIR(path.dentry->d_inode->i_mode))
 		goto exit;
 ok:
+	/*
+	 * Consider:
+	 * 1. may_open() truncates a file
+	 * 2. a rw->ro mount transition occurs
+	 * 3. nameidata_to_filp() fails due to
+	 *    the ro mount.
+	 * That would be inconsistent, and should
+	 * be avoided. Taking this mnt write here
+	 * ensures that (2) can not occur.
+	 */
+	will_write = open_will_write_to_fs(flag, nd.dentry->d_inode);
+	if (will_write) {
+		error = mnt_want_write(nd.mnt);
+		if (error)
+			goto exit;
+	}
 	error = may_open(&nd, acc_mode, flag);
-	if (error)
+	if (error) {
+		if (will_write)
+			mnt_drop_write(nd.mnt);
 		goto exit;
-	return nameidata_to_filp(&nd, open_flag);
+	}
+	filp = nameidata_to_filp(&nd, open_flag);
+	/*
+	 * It is now safe to drop the mnt write
+	 * because the filp has had a write taken
+	 * on its behalf.
+	 */
+	if (will_write)
+		mnt_drop_write(nd.mnt);
+	return filp;
 
+exit_mutex_unlock:
+	mutex_unlock(&dir->d_inode->i_mutex);
 exit_dput:
 	dput_path(&path, &nd);
 exit:
diff -puN fs/open.c~r-o-bind-mounts-elevate-write-count-opened-files fs/open.c
--- linux-2.6.git/fs/open.c~r-o-bind-mounts-elevate-write-count-opened-files	2008-02-08 13:04:55.000000000 -0800
+++ linux-2.6.git-dave/fs/open.c	2008-02-08 13:04:55.000000000 -0800
@@ -734,6 +734,35 @@ out:
 	return error;
 }
 
+/*
+ * You have to be very careful that these write
+ * counts get cleaned up in error cases and
+ * upon __fput().  This should probably never
+ * be called outside of __dentry_open().
+ */
+static inline int __get_file_write_access(struct inode *inode,
+					  struct vfsmount *mnt)
+{
+	int error;
+	error = get_write_access(inode);
+	if (error)
+		return error;
+	/*
+	 * Do not take mount writer counts on
+	 * special files since no writes to
+	 * the mount itself will occur.
+	 */
+	if (!special_file(inode->i_mode)) {
+		/*
+		 * Balanced in __fput()
+		 */
+		error = mnt_want_write(mnt);
+		if (error)
+			put_write_access(inode);
+	}
+	return error;
+}
+
 static struct file *__dentry_open(struct dentry *dentry, struct vfsmount *mnt,
 					int flags, struct file *f,
 					int (*open)(struct inode *, struct file *))
@@ -746,7 +775,7 @@ static struct file *__dentry_open(struct
 				FMODE_PREAD | FMODE_PWRITE;
 	inode = dentry->d_inode;
 	if (f->f_mode & FMODE_WRITE) {
-		error = get_write_access(inode);
+		error = __get_file_write_access(inode, mnt);
 		if (error)
 			goto cleanup_file;
 	}
@@ -788,8 +817,11 @@ static struct file *__dentry_open(struct
 
 cleanup_all:
 	fops_put(f->f_op);
-	if (f->f_mode & FMODE_WRITE)
+	if (f->f_mode & FMODE_WRITE) {
 		put_write_access(inode);
+		if (!special_file(inode->i_mode))
+			mnt_drop_write(mnt);
+	}
 	file_kill(f);
 	f->f_path.dentry = NULL;
 	f->f_path.mnt = NULL;
diff -puN ipc/mqueue.c~r-o-bind-mounts-elevate-write-count-opened-files ipc/mqueue.c
--- linux-2.6.git/ipc/mqueue.c~r-o-bind-mounts-elevate-write-count-opened-files	2008-02-08 13:04:55.000000000 -0800
+++ linux-2.6.git-dave/ipc/mqueue.c	2008-02-08 13:04:55.000000000 -0800
@@ -599,6 +599,7 @@ static struct file *do_create(struct den
 			int oflag, mode_t mode, struct mq_attr __user *u_attr)
 {
 	struct mq_attr attr;
+	struct file *result;
 	int ret;
 
 	if (u_attr) {
@@ -613,13 +614,24 @@ static struct file *do_create(struct den
 	}
 
 	mode &= ~current->fs->umask;
+	ret = mnt_want_write(mqueue_mnt);
+	if (ret)
+		goto out;
 	ret = vfs_create(dir->d_inode, dentry, mode, NULL);
 	dentry->d_fsdata = NULL;
 	if (ret)
-		goto out;
+		goto out_drop_write;
 
-	return dentry_open(dentry, mqueue_mnt, oflag);
+	result = dentry_open(dentry, mqueue_mnt, oflag);
+	/*
+	 * dentry_open() took a persistent mnt_want_write(),
+	 * so we can now drop this one.
+	 */
+	mnt_drop_write(mqueue_mnt);
+	return result;
 
+out_drop_write:
+	mnt_drop_write(mqueue_mnt);
 out:
 	dput(dentry);
 	mntput(mqueue_mnt);
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 21/30] r/o bind mounts: get write access for vfs_rename() callers
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (19 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 20/30] r/o bind mounts: elevate write count for open()s Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 22/30] r/o bind mounts: elevate write count for chmod/chown callers Dave Hansen
                   ` (11 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


This also uses the little helper in the NFS code to make an if() a little bit
less ugly.  We introduced the helper at the beginning of the series.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 linux-2.6.git-dave/fs/namei.c    |    4 ++++
 linux-2.6.git-dave/fs/nfsd/vfs.c |   17 +++++++++++++----
 2 files changed, 17 insertions(+), 4 deletions(-)

diff -puN fs/namei.c~r-o-bind-mounts-elevate-write-count-over-calls-to-vfs_rename fs/namei.c
--- linux-2.6.git/fs/namei.c~r-o-bind-mounts-elevate-write-count-over-calls-to-vfs_rename	2008-02-08 13:04:56.000000000 -0800
+++ linux-2.6.git-dave/fs/namei.c	2008-02-08 13:04:56.000000000 -0800
@@ -2749,8 +2749,12 @@ static int do_rename(int olddfd, const c
 	if (new_dentry == trap)
 		goto exit5;
 
+	error = mnt_want_write(oldnd.mnt);
+	if (error)
+		goto exit5;
 	error = vfs_rename(old_dir->d_inode, old_dentry,
 				   new_dir->d_inode, new_dentry);
+	mnt_drop_write(oldnd.mnt);
 exit5:
 	dput(new_dentry);
 exit4:
diff -puN fs/nfsd/vfs.c~r-o-bind-mounts-elevate-write-count-over-calls-to-vfs_rename fs/nfsd/vfs.c
--- linux-2.6.git/fs/nfsd/vfs.c~r-o-bind-mounts-elevate-write-count-over-calls-to-vfs_rename	2008-02-08 13:04:56.000000000 -0800
+++ linux-2.6.git-dave/fs/nfsd/vfs.c	2008-02-08 13:04:56.000000000 -0800
@@ -1677,13 +1677,20 @@ nfsd_rename(struct svc_rqst *rqstp, stru
 	if (ndentry == trap)
 		goto out_dput_new;
 
-#ifdef MSNFS
-	if ((ffhp->fh_export->ex_flags & NFSEXP_MSNFS) &&
+	if (svc_msnfs(ffhp) &&
 		((atomic_read(&odentry->d_count) > 1)
 		 || (atomic_read(&ndentry->d_count) > 1))) {
 			host_err = -EPERM;
-	} else
-#endif
+			goto out_dput_new;
+	}
+
+	host_err = -EXDEV;
+	if (ffhp->fh_export->ex_mnt != tfhp->fh_export->ex_mnt)
+		goto out_dput_new;
+	host_err = mnt_want_write(ffhp->fh_export->ex_mnt);
+	if (host_err)
+		goto out_dput_new;
+
 	host_err = vfs_rename(fdir, odentry, tdir, ndentry);
 	if (!host_err && EX_ISSYNC(tfhp->fh_export)) {
 		host_err = nfsd_sync_dir(tdentry);
@@ -1691,6 +1698,8 @@ nfsd_rename(struct svc_rqst *rqstp, stru
 			host_err = nfsd_sync_dir(fdentry);
 	}
 
+	mnt_drop_write(ffhp->fh_export->ex_mnt);
+
  out_dput_new:
 	dput(ndentry);
  out_dput_old:
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 22/30] r/o bind mounts: elevate write count for chmod/chown callers
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (20 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 21/30] r/o bind mounts: get write access for vfs_rename() callers Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 23/30] r/o bind mounts: write counts for truncate() Dave Hansen
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


chown/chmod,etc...  don't call permission in the same way that the normal
"open for write" calls do.  They still write to the filesystem, so bump the
write count during these operations.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 linux-2.6.git-dave/fs/open.c |   39 ++++++++++++++++++++++++++++++---------
 1 file changed, 30 insertions(+), 9 deletions(-)

diff -puN fs/open.c~r-o-bind-mounts-elevate-writer-count-for-chown-and-friends fs/open.c
--- linux-2.6.git/fs/open.c~r-o-bind-mounts-elevate-writer-count-for-chown-and-friends	2008-02-08 13:04:57.000000000 -0800
+++ linux-2.6.git-dave/fs/open.c	2008-02-08 13:04:57.000000000 -0800
@@ -571,12 +571,12 @@ asmlinkage long sys_fchmod(unsigned int 
 
 	audit_inode(NULL, dentry);
 
-	err = -EROFS;
-	if (IS_RDONLY(inode))
+	err = mnt_want_write(file->f_vfsmnt);
+	if (err)
 		goto out_putf;
 	err = -EPERM;
 	if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
-		goto out_putf;
+		goto out_drop_write;
 	mutex_lock(&inode->i_mutex);
 	if (mode == (mode_t) -1)
 		mode = inode->i_mode;
@@ -585,6 +585,8 @@ asmlinkage long sys_fchmod(unsigned int 
 	err = notify_change(dentry, &newattrs);
 	mutex_unlock(&inode->i_mutex);
 
+out_drop_write:
+	mnt_drop_write(file->f_vfsmnt);
 out_putf:
 	fput(file);
 out:
@@ -604,13 +606,13 @@ asmlinkage long sys_fchmodat(int dfd, co
 		goto out;
 	inode = nd.dentry->d_inode;
 
-	error = -EROFS;
-	if (IS_RDONLY(inode))
+	error = mnt_want_write(nd.mnt);
+	if (error)
 		goto dput_and_out;
 
 	error = -EPERM;
 	if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
-		goto dput_and_out;
+		goto out_drop_write;
 
 	mutex_lock(&inode->i_mutex);
 	if (mode == (mode_t) -1)
@@ -620,6 +622,8 @@ asmlinkage long sys_fchmodat(int dfd, co
 	error = notify_change(nd.dentry, &newattrs);
 	mutex_unlock(&inode->i_mutex);
 
+out_drop_write:
+	mnt_drop_write(nd.mnt);
 dput_and_out:
 	path_release(&nd);
 out:
@@ -642,9 +646,6 @@ static int chown_common(struct dentry * 
 		printk(KERN_ERR "chown_common: NULL inode\n");
 		goto out;
 	}
-	error = -EROFS;
-	if (IS_RDONLY(inode))
-		goto out;
 	error = -EPERM;
 	if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
 		goto out;
@@ -675,7 +676,12 @@ asmlinkage long sys_chown(const char __u
 	error = user_path_walk(filename, &nd);
 	if (error)
 		goto out;
+	error = mnt_want_write(nd.mnt);
+	if (error)
+		goto out_release;
 	error = chown_common(nd.dentry, user, group);
+	mnt_drop_write(nd.mnt);
+out_release:
 	path_release(&nd);
 out:
 	return error;
@@ -695,7 +701,12 @@ asmlinkage long sys_fchownat(int dfd, co
 	error = __user_walk_fd(dfd, filename, follow, &nd);
 	if (error)
 		goto out;
+	error = mnt_want_write(nd.mnt);
+	if (error)
+		goto out_release;
 	error = chown_common(nd.dentry, user, group);
+	mnt_drop_write(nd.mnt);
+out_release:
 	path_release(&nd);
 out:
 	return error;
@@ -709,7 +720,12 @@ asmlinkage long sys_lchown(const char __
 	error = user_path_walk_link(filename, &nd);
 	if (error)
 		goto out;
+	error = mnt_want_write(nd.mnt);
+	if (error)
+		goto out_release;
 	error = chown_common(nd.dentry, user, group);
+	mnt_drop_write(nd.mnt);
+out_release:
 	path_release(&nd);
 out:
 	return error;
@@ -726,9 +742,14 @@ asmlinkage long sys_fchown(unsigned int 
 	if (!file)
 		goto out;
 
+	error = mnt_want_write(file->f_vfsmnt);
+	if (error)
+		goto out_fput;
 	dentry = file->f_path.dentry;
 	audit_inode(NULL, dentry);
 	error = chown_common(dentry, user, group);
+	mnt_drop_write(file->f_vfsmnt);
+out_fput:
 	fput(file);
 out:
 	return error;
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 23/30] r/o bind mounts: write counts for truncate()
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (21 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 22/30] r/o bind mounts: elevate write count for chmod/chown callers Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 24/30] r/o bind mounts: elevate count for xfs timestamp updates Dave Hansen
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen




Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---

 linux-2.6.git-dave/fs/open.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff -puN fs/open.c~r-o-bind-mounts-elevate-writer-count-for-do_sys_truncate fs/open.c
--- linux-2.6.git/fs/open.c~r-o-bind-mounts-elevate-writer-count-for-do_sys_truncate	2008-02-08 13:04:57.000000000 -0800
+++ linux-2.6.git-dave/fs/open.c	2008-02-08 13:04:57.000000000 -0800
@@ -244,21 +244,21 @@ static long do_sys_truncate(const char _
 	if (!S_ISREG(inode->i_mode))
 		goto dput_and_out;
 
-	error = vfs_permission(&nd, MAY_WRITE);
+	error = mnt_want_write(nd.mnt);
 	if (error)
 		goto dput_and_out;
 
-	error = -EROFS;
-	if (IS_RDONLY(inode))
-		goto dput_and_out;
+	error = vfs_permission(&nd, MAY_WRITE);
+	if (error)
+		goto mnt_drop_write_and_out;
 
 	error = -EPERM;
 	if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
-		goto dput_and_out;
+		goto mnt_drop_write_and_out;
 
 	error = get_write_access(inode);
 	if (error)
-		goto dput_and_out;
+		goto mnt_drop_write_and_out;
 
 	/*
 	 * Make sure that there are no leases.  get_write_access() protects
@@ -276,6 +276,8 @@ static long do_sys_truncate(const char _
 
 put_write_and_out:
 	put_write_access(inode);
+mnt_drop_write_and_out:
+	mnt_drop_write(nd.mnt);
 dput_and_out:
 	path_release(&nd);
 out:
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 24/30] r/o bind mounts: elevate count for xfs timestamp updates
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (22 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 23/30] r/o bind mounts: write counts for truncate() Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 25/30] r/o bind mounts: make access() use new r/o helper Dave Hansen
                   ` (8 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


Elevate the write count during the xfs m/ctime updates.

XFS has to do it's own timestamp updates due to an unfortunate VFS
design limitation, so it will have to track writers by itself aswell.

[hch: split out from the touch_atime patch as it's not related to it at all]

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---

 linux-2.6.git-dave/fs/xfs/linux-2.6/xfs_iops.c |    7 -------
 linux-2.6.git-dave/fs/xfs/linux-2.6/xfs_lrw.c  |    9 ++++++++-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff -puN fs/xfs/linux-2.6/xfs_iops.c~r-o-bind-mounts-elevate-writer-count-for-xfs-timestamp-updates fs/xfs/linux-2.6/xfs_iops.c
--- linux-2.6.git/fs/xfs/linux-2.6/xfs_iops.c~r-o-bind-mounts-elevate-writer-count-for-xfs-timestamp-updates	2008-02-08 13:04:58.000000000 -0800
+++ linux-2.6.git-dave/fs/xfs/linux-2.6/xfs_iops.c	2008-02-08 13:04:58.000000000 -0800
@@ -157,13 +157,6 @@ xfs_ichgtime_fast(
 	 */
 	ASSERT((flags & XFS_ICHGTIME_ACC) == 0);
 
-	/*
-	 * We're not supposed to change timestamps in readonly-mounted
-	 * filesystems.  Throw it away if anyone asks us.
-	 */
-	if (unlikely(IS_RDONLY(inode)))
-		return;
-
 	if (flags & XFS_ICHGTIME_MOD) {
 		tvp = &inode->i_mtime;
 		ip->i_d.di_mtime.t_sec = (__int32_t)tvp->tv_sec;
diff -puN fs/xfs/linux-2.6/xfs_lrw.c~r-o-bind-mounts-elevate-writer-count-for-xfs-timestamp-updates fs/xfs/linux-2.6/xfs_lrw.c
--- linux-2.6.git/fs/xfs/linux-2.6/xfs_lrw.c~r-o-bind-mounts-elevate-writer-count-for-xfs-timestamp-updates	2008-02-08 13:04:58.000000000 -0800
+++ linux-2.6.git-dave/fs/xfs/linux-2.6/xfs_lrw.c	2008-02-08 13:04:58.000000000 -0800
@@ -51,6 +51,7 @@
 #include "xfs_vnodeops.h"
 
 #include <linux/capability.h>
+#include <linux/mount.h>
 #include <linux/writeback.h>
 
 
@@ -679,10 +680,16 @@ start:
 	if (new_size > xip->i_size)
 		xip->i_new_size = new_size;
 
-	if (likely(!(ioflags & IO_INVIS))) {
+	/*
+	 * We're not supposed to change timestamps in readonly-mounted
+	 * filesystems.  Throw it away if anyone asks us.
+	 */
+	if (likely(!(ioflags & IO_INVIS) &&
+		   !mnt_want_write(file->f_vfsmnt))) {
 		file_update_time(file);
 		xfs_ichgtime_fast(xip, inode,
 				  XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
+		mnt_drop_write(file->f_vfsmnt);
 	}
 
 	/*
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 25/30] r/o bind mounts: make access() use new r/o helper
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (23 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 24/30] r/o bind mounts: elevate count for xfs timestamp updates Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 26/30] r/o bind mounts: check mnt instead of superblock directly Dave Hansen
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


It is OK to let access() go without using a mnt_want/drop_write() pair because
it doesn't actually do writes to the filesystem, and it is inherently racy
anyway.  This is a rare case when it is OK to use __mnt_is_readonly()
directly.

Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 linux-2.6.git-dave/fs/open.c |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff -puN fs/open.c~r-o-bind-mounts-make-access-use-mnt-check fs/open.c
--- linux-2.6.git/fs/open.c~r-o-bind-mounts-make-access-use-mnt-check	2008-02-08 13:04:58.000000000 -0800
+++ linux-2.6.git-dave/fs/open.c	2008-02-08 13:04:58.000000000 -0800
@@ -459,8 +459,17 @@ asmlinkage long sys_faccessat(int dfd, c
 	if(res || !(mode & S_IWOTH) ||
 	   special_file(nd.dentry->d_inode->i_mode))
 		goto out_path_release;
-
-	if(IS_RDONLY(nd.dentry->d_inode))
+	/*
+	 * This is a rare case where using __mnt_is_readonly()
+	 * is OK without a mnt_want/drop_write() pair.  Since
+	 * no actual write to the fs is performed here, we do
+	 * not need to telegraph to that to anyone.
+	 *
+	 * By doing this, we accept that this access is
+	 * inherently racy and know that the fs may change
+	 * state before we even see this result.
+	 */
+	if (__mnt_is_readonly(nd.mnt))
 		res = -EROFS;
 
 out_path_release:
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 26/30] r/o bind mounts: check mnt instead of superblock directly
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (24 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 25/30] r/o bind mounts: make access() use new r/o helper Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 27/30] r/o bind mounts: get callers of vfs_mknod/create() Dave Hansen
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


If we depend on the inodes for writeability, we will not catch the r/o mounts
when implemented.

This patches uses __mnt_want_write().  It does not guarantee that the mount
will stay writeable after the check.  But, this is OK for one of the checks
because it is just for a printk().

The other two are probably unnecessary and duplicate existing checks in the
VFS.  This won't make them better checks than before, but it will make them
detect r/o mounts.

Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 linux-2.6.git-dave/fs/nfs/dir.c  |    3 ++-
 linux-2.6.git-dave/fs/nfsd/vfs.c |    5 +++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff -puN fs/nfs/dir.c~r-o-bind-mounts-nfs-check-mnt-instead-of-superblock-directly fs/nfs/dir.c
--- linux-2.6.git/fs/nfs/dir.c~r-o-bind-mounts-nfs-check-mnt-instead-of-superblock-directly	2008-02-08 13:04:59.000000000 -0800
+++ linux-2.6.git-dave/fs/nfs/dir.c	2008-02-08 13:04:59.000000000 -0800
@@ -967,7 +967,8 @@ static int is_atomic_open(struct inode *
 	if (nd->flags & LOOKUP_DIRECTORY)
 		return 0;
 	/* Are we trying to write to a read only partition? */
-	if (IS_RDONLY(dir) && (nd->intent.open.flags & (O_CREAT|O_TRUNC|FMODE_WRITE)))
+	if (__mnt_is_readonly(nd->mnt) &&
+	    (nd->intent.open.flags & (O_CREAT|O_TRUNC|FMODE_WRITE)))
 		return 0;
 	return 1;
 }
diff -puN fs/nfsd/vfs.c~r-o-bind-mounts-nfs-check-mnt-instead-of-superblock-directly fs/nfsd/vfs.c
--- linux-2.6.git/fs/nfsd/vfs.c~r-o-bind-mounts-nfs-check-mnt-instead-of-superblock-directly	2008-02-08 13:04:59.000000000 -0800
+++ linux-2.6.git-dave/fs/nfsd/vfs.c	2008-02-08 13:04:59.000000000 -0800
@@ -1873,7 +1873,7 @@ nfsd_permission(struct svc_rqst *rqstp, 
 		inode->i_mode,
 		IS_IMMUTABLE(inode)?	" immut" : "",
 		IS_APPEND(inode)?	" append" : "",
-		IS_RDONLY(inode)?	" ro" : "");
+		__mnt_is_readonly(exp->ex_mnt)?	" ro" : "");
 	dprintk("      owner %d/%d user %d/%d\n",
 		inode->i_uid, inode->i_gid, current->fsuid, current->fsgid);
 #endif
@@ -1884,7 +1884,8 @@ nfsd_permission(struct svc_rqst *rqstp, 
 	 */
 	if (!(acc & MAY_LOCAL_ACCESS))
 		if (acc & (MAY_WRITE | MAY_SATTR | MAY_TRUNC)) {
-			if (exp_rdonly(rqstp, exp) || IS_RDONLY(inode))
+			if (exp_rdonly(rqstp, exp) ||
+			    __mnt_is_readonly(exp->ex_mnt))
 				return nfserr_rofs;
 			if (/* (acc & MAY_WRITE) && */ IS_IMMUTABLE(inode))
 				return nfserr_perm;
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 27/30] r/o bind mounts: get callers of vfs_mknod/create()
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (25 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 26/30] r/o bind mounts: check mnt instead of superblock directly Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 28/30] r/o bind mounts: track numbers of writers to mounts Dave Hansen
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


This takes care of all of the direct callers of vfs_mknod().
Since a few of these cases also handle normal file creation
as well, this also covers some calls to vfs_create().

So that we don't have to make three mnt_want/drop_write()
calls inside of the switch statement, we move some of its
logic outside of the switch and into a helper function
suggested by Christoph.

This also encapsulates a fix for mknod(S_IFREG) that Miklos
found.

Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 linux-2.6.git-dave/fs/namei.c         |   43 +++++++++++++++++++++++++---------
 linux-2.6.git-dave/fs/nfsd/vfs.c      |    4 +++
 linux-2.6.git-dave/net/unix/af_unix.c |    4 +++
 3 files changed, 40 insertions(+), 11 deletions(-)

diff -puN fs/namei.c~r-o-bind-mounts-sys_mknodat-elevate-write-count-for-vfs_mknod-create fs/namei.c
--- linux-2.6.git/fs/namei.c~r-o-bind-mounts-sys_mknodat-elevate-write-count-for-vfs_mknod-create	2008-02-08 13:05:00.000000000 -0800
+++ linux-2.6.git-dave/fs/namei.c	2008-02-08 13:05:00.000000000 -0800
@@ -2036,6 +2036,23 @@ int vfs_mknod(struct inode *dir, struct 
 	return error;
 }
 
+static int may_mknod(mode_t mode)
+{
+	switch (mode & S_IFMT) {
+	case S_IFREG:
+	case S_IFCHR:
+	case S_IFBLK:
+	case S_IFIFO:
+	case S_IFSOCK:
+	case 0: /* zero mode translates to S_IFREG */
+		return 0;
+	case S_IFDIR:
+		return -EPERM;
+	default:
+		return -EINVAL;
+	}
+}
+
 asmlinkage long sys_mknodat(int dfd, const char __user *filename, int mode,
 				unsigned dev)
 {
@@ -2054,12 +2071,19 @@ asmlinkage long sys_mknodat(int dfd, con
 	if (error)
 		goto out;
 	dentry = lookup_create(&nd, 0);
-	error = PTR_ERR(dentry);
-
+	if (IS_ERR(dentry)) {
+		error = PTR_ERR(dentry);
+		goto out_unlock;
+	}
 	if (!IS_POSIXACL(nd.dentry->d_inode))
 		mode &= ~current->fs->umask;
-	if (!IS_ERR(dentry)) {
-		switch (mode & S_IFMT) {
+	error = may_mknod(mode);
+	if (error)
+		goto out_dput;
+	error = mnt_want_write(nd.mnt);
+	if (error)
+		goto out_dput;
+	switch (mode & S_IFMT) {
 		case 0: case S_IFREG:
 			error = vfs_create(nd.dentry->d_inode,dentry,mode,&nd);
 			break;
@@ -2070,14 +2094,11 @@ asmlinkage long sys_mknodat(int dfd, con
 		case S_IFIFO: case S_IFSOCK:
 			error = vfs_mknod(nd.dentry->d_inode,dentry,mode,0);
 			break;
-		case S_IFDIR:
-			error = -EPERM;
-			break;
-		default:
-			error = -EINVAL;
-		}
-		dput(dentry);
 	}
+	mnt_drop_write(nd.mnt);
+out_dput:
+	dput(dentry);
+out_unlock:
 	mutex_unlock(&nd.dentry->d_inode->i_mutex);
 	path_release(&nd);
 out:
diff -puN fs/nfsd/vfs.c~r-o-bind-mounts-sys_mknodat-elevate-write-count-for-vfs_mknod-create fs/nfsd/vfs.c
--- linux-2.6.git/fs/nfsd/vfs.c~r-o-bind-mounts-sys_mknodat-elevate-write-count-for-vfs_mknod-create	2008-02-08 13:05:00.000000000 -0800
+++ linux-2.6.git-dave/fs/nfsd/vfs.c	2008-02-08 13:05:00.000000000 -0800
@@ -1263,7 +1263,11 @@ nfsd_create(struct svc_rqst *rqstp, stru
 	case S_IFBLK:
 	case S_IFIFO:
 	case S_IFSOCK:
+		host_err = mnt_want_write(fhp->fh_export->ex_mnt);
+		if (host_err)
+			break;
 		host_err = vfs_mknod(dirp, dchild, iap->ia_mode, rdev);
+		mnt_drop_write(fhp->fh_export->ex_mnt);
 		break;
 	default:
 	        printk("nfsd: bad file type %o in nfsd_create\n", type);
diff -puN net/unix/af_unix.c~r-o-bind-mounts-sys_mknodat-elevate-write-count-for-vfs_mknod-create net/unix/af_unix.c
--- linux-2.6.git/net/unix/af_unix.c~r-o-bind-mounts-sys_mknodat-elevate-write-count-for-vfs_mknod-create	2008-02-08 13:05:00.000000000 -0800
+++ linux-2.6.git-dave/net/unix/af_unix.c	2008-02-08 13:05:00.000000000 -0800
@@ -819,7 +819,11 @@ static int unix_bind(struct socket *sock
 		 */
 		mode = S_IFSOCK |
 		       (SOCK_INODE(sock)->i_mode & ~current->fs->umask);
+		err = mnt_want_write(nd.mnt);
+		if (err)
+			goto out_mknod_dput;
 		err = vfs_mknod(nd.dentry->d_inode, dentry, mode, 0);
+		mnt_drop_write(nd.mnt);
 		if (err)
 			goto out_mknod_dput;
 		mutex_unlock(&nd.dentry->d_inode->i_mutex);
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 28/30] r/o bind mounts: track numbers of writers to mounts
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (26 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 27/30] r/o bind mounts: get callers of vfs_mknod/create() Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 29/30] r/o bind mounts: honor mount writer counts at remount Dave Hansen
                   ` (4 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


This is the real meat of the entire series.  It actually
implements the tracking of the number of writers to a mount.
However, it causes scalability problems because there can be
hundreds of cpus doing open()/close() on files on the same mnt at
the same time.  Even an atomic_t in the mnt has massive scalaing
problems because the cacheline gets so terribly contended.

This uses a statically-allocated percpu variable.  All want/drop
operations are local to a cpu as long that cpu operates on the same
mount, and there are no writer count imbalances.  Writer count
imbalances happen when a write is taken on one cpu, and released
on another, like when an open/close pair is performed on two

Upon a remount,ro request, all of the data from the percpu
variables is collected (expensive, but very rare) and we determine
if there are any outstanding writers to the mount.

I've written a little benchmark to sit in a loop for a couple of
seconds in several cpus in parallel doing open/write/close loops.

http://sr71.net/~dave/linux/openbench.c

The code in here is a a worst-possible case for this patch.  It
does opens on a _pair_ of files in two different mounts in parallel.
This should cause my code to lose its "operate on the same mount"
optimization completely.  This worst-case scenario causes a 3%
degredation in the benchmark.

I could probably get rid of even this 3%, but it would be more
complex than what I have here, and I think this is getting into
acceptable territory.  In practice, I expect writing more than 3
bytes to a file, as well as disk I/O to mask any effects that this
has.

(To get rid of that 3%, we could have an #defined number of mounts
in the percpu variable.  So, instead of a CPU getting operate only
on percpu data when it accesses only one mount, it could stay on
percpu data when it only accesses N or fewer mounts.)

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 linux-2.6.git-dave/fs/namespace.c        |  240 +++++++++++++++++++++++++++++--
 linux-2.6.git-dave/include/linux/mount.h |    7 
 2 files changed, 232 insertions(+), 15 deletions(-)

diff -puN fs/namespace.c~r-o-bind-mounts-track-number-of-mount-writers fs/namespace.c
--- linux-2.6.git/fs/namespace.c~r-o-bind-mounts-track-number-of-mount-writers	2008-02-08 13:05:00.000000000 -0800
+++ linux-2.6.git-dave/fs/namespace.c	2008-02-08 13:05:00.000000000 -0800
@@ -17,6 +17,7 @@
 #include <linux/quotaops.h>
 #include <linux/acct.h>
 #include <linux/capability.h>
+#include <linux/cpumask.h>
 #include <linux/module.h>
 #include <linux/sysfs.h>
 #include <linux/seq_file.h>
@@ -55,6 +56,8 @@ static inline unsigned long hash(struct 
 	return tmp & (HASH_SIZE - 1);
 }
 
+#define MNT_WRITER_UNDERFLOW_LIMIT -(1<<16)
+
 struct vfsmount *alloc_vfsmnt(const char *name)
 {
 	struct vfsmount *mnt = kmem_cache_zalloc(mnt_cache, GFP_KERNEL);
@@ -68,6 +71,7 @@ struct vfsmount *alloc_vfsmnt(const char
 		INIT_LIST_HEAD(&mnt->mnt_share);
 		INIT_LIST_HEAD(&mnt->mnt_slave_list);
 		INIT_LIST_HEAD(&mnt->mnt_slave);
+		atomic_set(&mnt->__mnt_writers, 0);
 		if (name) {
 			int size = strlen(name) + 1;
 			char *newname = kmalloc(size, GFP_KERNEL);
@@ -88,6 +92,86 @@ struct vfsmount *alloc_vfsmnt(const char
  * we can determine when writes are able to occur to
  * a filesystem.
  */
+/*
+ * __mnt_is_readonly: check whether a mount is read-only
+ * @mnt: the mount to check for its write status
+ *
+ * This shouldn't be used directly ouside of the VFS.
+ * It does not guarantee that the filesystem will stay
+ * r/w, just that it is right *now*.  This can not and
+ * should not be used in place of IS_RDONLY(inode).
+ * mnt_want/drop_write() will _keep_ the filesystem
+ * r/w.
+ */
+int __mnt_is_readonly(struct vfsmount *mnt)
+{
+	return (mnt->mnt_sb->s_flags & MS_RDONLY);
+}
+EXPORT_SYMBOL_GPL(__mnt_is_readonly);
+
+struct mnt_writer {
+	/*
+	 * If holding multiple instances of this lock, they
+	 * must be ordered by cpu number.
+	 */
+	spinlock_t lock;
+	struct lock_class_key lock_class; /* compiles out with !lockdep */
+	unsigned long count;
+	struct vfsmount *mnt;
+} ____cacheline_aligned_in_smp;
+static DEFINE_PER_CPU(struct mnt_writer, mnt_writers);
+
+static int __init init_mnt_writers(void)
+{
+	int cpu;
+	for_each_possible_cpu(cpu) {
+		struct mnt_writer *writer = &per_cpu(mnt_writers, cpu);
+		spin_lock_init(&writer->lock);
+		lockdep_set_class(&writer->lock, &writer->lock_class);
+		writer->count = 0;
+	}
+	return 0;
+}
+fs_initcall(init_mnt_writers);
+
+static void unlock_mnt_writers(void)
+{
+	int cpu;
+	struct mnt_writer *cpu_writer;
+
+	for_each_possible_cpu(cpu) {
+		cpu_writer = &per_cpu(mnt_writers, cpu);
+		spin_unlock(&cpu_writer->lock);
+	}
+}
+
+static inline void __clear_mnt_count(struct mnt_writer *cpu_writer)
+{
+	if (!cpu_writer->mnt)
+		return;
+	atomic_add(cpu_writer->count, &cpu_writer->mnt->__mnt_writers);
+	cpu_writer->count = 0;
+}
+ /*
+ * must hold cpu_writer->lock
+ */
+static inline void use_cpu_writer_for_mount(struct mnt_writer *cpu_writer,
+					  struct vfsmount *mnt)
+{
+	if (cpu_writer->mnt == mnt)
+		return;
+	__clear_mnt_count(cpu_writer);
+	cpu_writer->mnt = mnt;
+}
+
+/*
+ * Most r/o checks on a fs are for operations that take
+ * discrete amounts of time, like a write() or unlink().
+ * We must keep track of when those operations start
+ * (for permission checks) and when they end, so that
+ * we can determine when writes are able to occur to
+ * a filesystem.
+ */
 /**
  * mnt_want_write - get write access to a mount
  * @mnt: the mount on which to take a write
@@ -100,12 +184,77 @@ struct vfsmount *alloc_vfsmnt(const char
  */
 int mnt_want_write(struct vfsmount *mnt)
 {
-	if (__mnt_is_readonly(mnt))
-		return -EROFS;
-	return 0;
+	int ret = 0;
+	struct mnt_writer *cpu_writer;
+
+	cpu_writer = &get_cpu_var(mnt_writers);
+	spin_lock(&cpu_writer->lock);
+	if (__mnt_is_readonly(mnt)) {
+		ret = -EROFS;
+		goto out;
+	}
+	use_cpu_writer_for_mount(cpu_writer, mnt);
+	cpu_writer->count++;
+out:
+	spin_unlock(&cpu_writer->lock);
+	put_cpu_var(mnt_writers);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(mnt_want_write);
 
+static void lock_mnt_writers(void)
+{
+	int cpu;
+	struct mnt_writer *cpu_writer;
+
+	for_each_possible_cpu(cpu) {
+		cpu_writer = &per_cpu(mnt_writers, cpu);
+		spin_lock(&cpu_writer->lock);
+		__clear_mnt_count(cpu_writer);
+		cpu_writer->mnt = NULL;
+	}
+}
+
+/*
+ * These per-cpu write counts are not guaranteed to have
+ * matched increments and decrements on any given cpu.
+ * A file open()ed for write on one cpu and close()d on
+ * another cpu will imbalance this count.  Make sure it
+ * does not get too far out of whack.
+ */
+static void handle_write_count_underflow(struct vfsmount *mnt)
+{
+	if (atomic_read(&mnt->__mnt_writers) >=
+	    MNT_WRITER_UNDERFLOW_LIMIT)
+		return;
+	/*
+	 * It isn't necessary to hold all of the locks
+	 * at the same time, but doing it this way makes
+	 * us share a lot more code.
+	 */
+	lock_mnt_writers();
+	/*
+	 * vfsmount_lock is for mnt_flags.
+	 */
+	spin_lock(&vfsmount_lock);
+	/*
+	 * If coalescing the per-cpu writer counts did not
+	 * get us back to a positive writer count, we have
+	 * a bug.
+	 */
+	if ((atomic_read(&mnt->__mnt_writers) < 0) &&
+	    !(mnt->mnt_flags & MNT_IMBALANCED_WRITE_COUNT)) {
+		printk(KERN_DEBUG "leak detected on mount(%p) writers "
+				"count: %d\n",
+			mnt, atomic_read(&mnt->__mnt_writers));
+		WARN_ON(1);
+		/* use the flag to keep the dmesg spam down */
+		mnt->mnt_flags |= MNT_IMBALANCED_WRITE_COUNT;
+	}
+	spin_unlock(&vfsmount_lock);
+	unlock_mnt_writers();
+}
+
 /**
  * mnt_drop_write - give up write access to a mount
  * @mnt: the mount on which to give up write access
@@ -116,23 +265,61 @@ EXPORT_SYMBOL_GPL(mnt_want_write);
  */
 void mnt_drop_write(struct vfsmount *mnt)
 {
+	int must_check_underflow = 0;
+	struct mnt_writer *cpu_writer;
+
+	cpu_writer = &get_cpu_var(mnt_writers);
+	spin_lock(&cpu_writer->lock);
+
+	use_cpu_writer_for_mount(cpu_writer, mnt);
+	if (cpu_writer->count > 0) {
+		cpu_writer->count--;
+	} else {
+		must_check_underflow = 1;
+		atomic_dec(&mnt->__mnt_writers);
+	}
+
+	spin_unlock(&cpu_writer->lock);
+	/*
+	 * Logically, we could call this each time,
+	 * but the __mnt_writers cacheline tends to
+	 * be cold, and makes this expensive.
+	 */
+	if (must_check_underflow)
+		handle_write_count_underflow(mnt);
+	/*
+	 * This could be done right after the spinlock
+	 * is taken because the spinlock keeps us on
+	 * the cpu, and disables preemption.  However,
+	 * putting it here bounds the amount that
+	 * __mnt_writers can underflow.  Without it,
+	 * we could theoretically wrap __mnt_writers.
+	 */
+	put_cpu_var(mnt_writers);
 }
 EXPORT_SYMBOL_GPL(mnt_drop_write);
 
-/*
- * __mnt_is_readonly: check whether a mount is read-only
- * @mnt: the mount to check for its write status
- *
- * This shouldn't be used directly ouside of the VFS.
- * It does not guarantee that the filesystem will stay
- * r/w, just that it is right *now*.  This can not and
- * should not be used in place of IS_RDONLY(inode).
- */
-int __mnt_is_readonly(struct vfsmount *mnt)
+int mnt_make_readonly(struct vfsmount *mnt)
 {
-	return (mnt->mnt_sb->s_flags & MS_RDONLY);
+	int ret = 0;
+
+	lock_mnt_writers();
+	/*
+	 * With all the locks held, this value is stable
+	 */
+	if (atomic_read(&mnt->__mnt_writers) > 0) {
+		ret = -EBUSY;
+		goto out;
+	}
+	/*
+	 * actually set mount's r/o flag here to make
+	 * __mnt_is_readonly() true, which keeps anyone
+	 * from doing a successful mnt_want_write().
+	 */
+out:
+	unlock_mnt_writers();
+	return ret;
 }
-EXPORT_SYMBOL_GPL(__mnt_is_readonly);
 
 int simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
 {
@@ -327,7 +514,30 @@ static struct vfsmount *clone_mnt(struct
 
 static inline void __mntput(struct vfsmount *mnt)
 {
+	int cpu;
 	struct super_block *sb = mnt->mnt_sb;
+	/*
+	 * We don't have to hold all of the locks at the
+	 * same time here because we know that we're the
+	 * last reference to mnt and that no new writers
+	 * can come in.
+	 */
+	for_each_possible_cpu(cpu) {
+		struct mnt_writer *cpu_writer = &per_cpu(mnt_writers, cpu);
+		if (cpu_writer->mnt != mnt)
+			continue;
+		spin_lock(&cpu_writer->lock);
+		atomic_add(cpu_writer->count, &mnt->__mnt_writers);
+		cpu_writer->count = 0;
+		spin_unlock(&cpu_writer->lock);
+	}
+	/*
+	 * This probably indicates that somebody messed
+	 * up a mnt_want/drop_write() pair.  If this
+	 * happens, the filesystem was probably unable
+	 * to make r/w->r/o transitions.
+	 */
+	WARN_ON(atomic_read(&mnt->__mnt_writers));
 	dput(mnt->mnt_root);
 	free_vfsmnt(mnt);
 	deactivate_super(sb);
diff -puN include/linux/mount.h~r-o-bind-mounts-track-number-of-mount-writers include/linux/mount.h
--- linux-2.6.git/include/linux/mount.h~r-o-bind-mounts-track-number-of-mount-writers	2008-02-08 13:05:00.000000000 -0800
+++ linux-2.6.git-dave/include/linux/mount.h	2008-02-08 13:05:00.000000000 -0800
@@ -14,6 +14,7 @@
 
 #include <linux/types.h>
 #include <linux/list.h>
+#include <linux/nodemask.h>
 #include <linux/spinlock.h>
 #include <asm/atomic.h>
 
@@ -30,6 +31,7 @@ struct mnt_namespace;
 #define MNT_RELATIME	0x20
 
 #define MNT_SHRINKABLE	0x100
+#define MNT_IMBALANCED_WRITE_COUNT	0x200 /* just for debugging */
 
 #define MNT_SHARED	0x1000	/* if the vfsmount is a shared mount */
 #define MNT_UNBINDABLE	0x2000	/* if the vfsmount is a unbindable mount */
@@ -61,6 +63,11 @@ struct vfsmount {
 	atomic_t mnt_count;
 	int mnt_expiry_mark;		/* true if marked for expiry */
 	int mnt_pinned;
+	/*
+	 * This value is not stable unless all of the mnt_writers[] spinlocks
+	 * are held, and all mnt_writer[]s on this mount have 0 as their ->count
+	 */
+	atomic_t __mnt_writers;
 };
 
 static inline struct vfsmount *mntget(struct vfsmount *mnt)
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 29/30] r/o bind mounts: honor mount writer counts at remount
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (27 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 28/30] r/o bind mounts: track numbers of writers to mounts Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-08 22:27 ` [RFC][PATCH 30/30] r/o bind mounts: debugging for missed calls Dave Hansen
                   ` (3 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


Originally from: Herbert Poetzl <herbert@13thfloor.at>

This is the core of the read-only bind mount patch set.

Note that this does _not_ add a "ro" option directly to the bind mount
operation.  If you require such a mount, you must first do the bind, then
follow it up with a 'mount -o remount,ro' operation:

If you wish to have a r/o bind mount of /foo on bar:

	mount --bind /foo /bar
	mount -o remount,ro /bar

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 linux-2.6.git-dave/fs/namespace.c        |   50 ++++++++++++++++++++++++++-----
 linux-2.6.git-dave/include/linux/mount.h |    1 
 2 files changed, 44 insertions(+), 7 deletions(-)

diff -puN fs/namespace.c~r-o-bind-mounts-honor-r-w-changes-at-do_remount-time fs/namespace.c
--- linux-2.6.git/fs/namespace.c~r-o-bind-mounts-honor-r-w-changes-at-do_remount-time	2008-02-08 13:05:01.000000000 -0800
+++ linux-2.6.git-dave/fs/namespace.c	2008-02-08 13:05:01.000000000 -0800
@@ -105,7 +105,11 @@ struct vfsmount *alloc_vfsmnt(const char
  */
 int __mnt_is_readonly(struct vfsmount *mnt)
 {
-	return (mnt->mnt_sb->s_flags & MS_RDONLY);
+	if (mnt->mnt_flags & MNT_READONLY)
+		return 1;
+	if (mnt->mnt_sb->s_flags & MS_RDONLY)
+		return 1;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(__mnt_is_readonly);
 
@@ -299,7 +303,7 @@ void mnt_drop_write(struct vfsmount *mnt
 }
 EXPORT_SYMBOL_GPL(mnt_drop_write);
 
-int mnt_make_readonly(struct vfsmount *mnt)
+static int mnt_make_readonly(struct vfsmount *mnt)
 {
 	int ret = 0;
 
@@ -312,15 +316,25 @@ int mnt_make_readonly(struct vfsmount *m
 		goto out;
 	}
 	/*
-	 * actually set mount's r/o flag here to make
-	 * __mnt_is_readonly() true, which keeps anyone
-	 * from doing a successful mnt_want_write().
+	 * nobody can do a successful mnt_want_write() with all
+	 * of the counts in MNT_DENIED_WRITE and the locks held.
 	 */
+	spin_lock(&vfsmount_lock);
+	if (!ret)
+		mnt->mnt_flags |= MNT_READONLY;
+	spin_unlock(&vfsmount_lock);
 out:
 	unlock_mnt_writers();
 	return ret;
 }
 
+static void __mnt_unmake_readonly(struct vfsmount *mnt)
+{
+	spin_lock(&vfsmount_lock);
+	mnt->mnt_flags &= ~MNT_READONLY;
+	spin_unlock(&vfsmount_lock);
+}
+
 int simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
 {
 	mnt->mnt_sb = sb;
@@ -643,7 +657,7 @@ static int show_vfsmnt(struct seq_file *
 		seq_putc(m, '.');
 		mangle(m, mnt->mnt_sb->s_subtype);
 	}
-	seq_puts(m, mnt->mnt_sb->s_flags & MS_RDONLY ? " ro" : " rw");
+	seq_puts(m, __mnt_is_readonly(mnt) ? " ro" : " rw");
 	for (fs_infop = fs_info; fs_infop->flag; fs_infop++) {
 		if (mnt->mnt_sb->s_flags & fs_infop->flag)
 			seq_puts(m, fs_infop->str);
@@ -1231,6 +1245,23 @@ out:
 	return err;
 }
 
+static int change_mount_flags(struct vfsmount *mnt, int ms_flags)
+{
+	int error = 0;
+	int readonly_request = 0;
+
+	if (ms_flags & MS_RDONLY)
+		readonly_request = 1;
+	if (readonly_request == __mnt_is_readonly(mnt))
+		return 0;
+
+	if (readonly_request)
+		error = mnt_make_readonly(mnt);
+	else
+		__mnt_unmake_readonly(mnt);
+	return error;
+}
+
 /*
  * change filesystem flags. dir should be a physical root of filesystem.
  * If you've mounted a non-root directory somewhere and want to do remount
@@ -1252,7 +1283,10 @@ static int do_remount(struct nameidata *
 		return -EINVAL;
 
 	down_write(&sb->s_umount);
-	err = do_remount_sb(sb, flags, data, 0);
+	if (flags & MS_BIND)
+		err = change_mount_flags(nd->mnt, flags);
+	else
+		err = do_remount_sb(sb, flags, data, 0);
 	if (!err)
 		nd->mnt->mnt_flags = mnt_flags;
 	up_write(&sb->s_umount);
@@ -1696,6 +1730,8 @@ long do_mount(char *dev_name, char *dir_
 		mnt_flags |= MNT_NODIRATIME;
 	if (flags & MS_RELATIME)
 		mnt_flags |= MNT_RELATIME;
+	if (flags & MS_RDONLY)
+		mnt_flags |= MNT_READONLY;
 
 	flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE |
 		   MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT);
diff -puN include/linux/mount.h~r-o-bind-mounts-honor-r-w-changes-at-do_remount-time include/linux/mount.h
--- linux-2.6.git/include/linux/mount.h~r-o-bind-mounts-honor-r-w-changes-at-do_remount-time	2008-02-08 13:05:01.000000000 -0800
+++ linux-2.6.git-dave/include/linux/mount.h	2008-02-08 13:05:01.000000000 -0800
@@ -29,6 +29,7 @@ struct mnt_namespace;
 #define MNT_NOATIME	0x08
 #define MNT_NODIRATIME	0x10
 #define MNT_RELATIME	0x20
+#define MNT_READONLY	0x40	/* does the user want this to be r/o? */
 
 #define MNT_SHRINKABLE	0x100
 #define MNT_IMBALANCED_WRITE_COUNT	0x200 /* just for debugging */
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC][PATCH 30/30] r/o bind mounts: debugging for missed calls
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (28 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 29/30] r/o bind mounts: honor mount writer counts at remount Dave Hansen
@ 2008-02-08 22:27 ` Dave Hansen
  2008-02-09  6:39 ` [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Christoph Hellwig
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2008-02-08 22:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: hch, viro, viro, miklos, Dave Hansen


There have been a few oopses caused by 'struct file's with NULL f_vfsmnts. 
There was also a set of potentially missed mnt_want_write()s from
dentry_open() calls.

This patch provides a very simple debugging framework to catch these kinds of
bugs.  It will WARN_ON() them, but should stop us from having any oopses or
mnt_writer count imbalances.

I'm quite convinced that this is a good thing because it found bugs in the
stuff I was working on as soon as I wrote it.

[hch: made it conditional on a debug option.
      But it's still a little bit too ugly]

[hch: merged forced remount r/o fix from Dave and akpm's fix for the fix]

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 linux-2.6.git-dave/fs/file_table.c    |   11 ++++++-
 linux-2.6.git-dave/fs/open.c          |   12 +++++++-
 linux-2.6.git-dave/fs/super.c         |    3 ++
 linux-2.6.git-dave/include/linux/fs.h |   49 ++++++++++++++++++++++++++++++++++
 linux-2.6.git-dave/lib/Kconfig.debug  |   10 ++++++
 5 files changed, 82 insertions(+), 3 deletions(-)

diff -puN fs/file_table.c~keep-track-of-mnt_writer-state-of-struct-file fs/file_table.c
--- linux-2.6.git/fs/file_table.c~keep-track-of-mnt_writer-state-of-struct-file	2008-02-08 13:05:02.000000000 -0800
+++ linux-2.6.git-dave/fs/file_table.c	2008-02-08 13:05:02.000000000 -0800
@@ -42,6 +42,7 @@ static inline void file_free_rcu(struct 
 static inline void file_free(struct file *f)
 {
 	percpu_counter_dec(&nr_files);
+	file_check_state(f);
 	call_rcu(&f->f_u.fu_rcuhead, file_free_rcu);
 }
 
@@ -207,6 +208,7 @@ int init_file(struct file *file, struct 
 	 * that we can do debugging checks at __fput()e
 	 */
 	if ((mode & FMODE_WRITE) && !special_file(dentry->d_inode->i_mode)) {
+		file_take_write(file);
 		error = mnt_want_write(mnt);
 		WARN_ON(error);
 	}
@@ -237,8 +239,13 @@ void drop_file_write_access(struct file 
 	struct inode *inode = dentry->d_inode;
 
 	put_write_access(inode);
-	if (!special_file(inode->i_mode))
-		mnt_drop_write(mnt);
+
+	if (special_file(inode->i_mode))
+		return;
+	if (file_check_writeable(file) != 0)
+		return;
+	mnt_drop_write(mnt);
+	file_release_write(file);
 }
 EXPORT_SYMBOL_GPL(drop_file_write_access);
 
diff -puN fs/open.c~keep-track-of-mnt_writer-state-of-struct-file fs/open.c
--- linux-2.6.git/fs/open.c~keep-track-of-mnt_writer-state-of-struct-file	2008-02-08 13:05:02.000000000 -0800
+++ linux-2.6.git-dave/fs/open.c	2008-02-08 13:05:02.000000000 -0800
@@ -810,6 +810,8 @@ static struct file *__dentry_open(struct
 		error = __get_file_write_access(inode, mnt);
 		if (error)
 			goto cleanup_file;
+		if (!special_file(inode->i_mode))
+			file_take_write(f);
 	}
 
 	f->f_mapping = inode->i_mapping;
@@ -851,8 +853,16 @@ cleanup_all:
 	fops_put(f->f_op);
 	if (f->f_mode & FMODE_WRITE) {
 		put_write_access(inode);
-		if (!special_file(inode->i_mode))
+		if (!special_file(inode->i_mode)) {
+			/*
+			 * We don't consider this a real
+			 * mnt_want/drop_write() pair
+			 * because it all happenend right
+			 * here, so just reset the state.
+			 */
+			file_reset_write(f);
 			mnt_drop_write(mnt);
+		}
 	}
 	file_kill(f);
 	f->f_path.dentry = NULL;
diff -puN fs/super.c~keep-track-of-mnt_writer-state-of-struct-file fs/super.c
--- linux-2.6.git/fs/super.c~keep-track-of-mnt_writer-state-of-struct-file	2008-02-08 13:05:02.000000000 -0800
+++ linux-2.6.git-dave/fs/super.c	2008-02-08 13:05:02.000000000 -0800
@@ -578,6 +578,9 @@ retry:
 		if (!(f->f_mode & FMODE_WRITE))
 			continue;
 		f->f_mode &= ~FMODE_WRITE;
+		if (file_check_writeable(f) != 0)
+			continue;
+		file_release_write(f);
 		mnt = f->f_path.mnt;
 		file_list_unlock();
 		/*
diff -puN include/linux/fs.h~keep-track-of-mnt_writer-state-of-struct-file include/linux/fs.h
--- linux-2.6.git/include/linux/fs.h~keep-track-of-mnt_writer-state-of-struct-file	2008-02-08 13:05:02.000000000 -0800
+++ linux-2.6.git-dave/include/linux/fs.h	2008-02-08 13:05:02.000000000 -0800
@@ -776,6 +776,9 @@ static inline int ra_has_index(struct fi
 		index <  ra->start + ra->size);
 }
 
+#define FILE_MNT_WRITE_TAKEN	1
+#define FILE_MNT_WRITE_RELEASED	2
+
 struct file {
 	/*
 	 * fu_list becomes invalid after file_free is called and queued via
@@ -810,6 +813,9 @@ struct file {
 	spinlock_t		f_ep_lock;
 #endif /* #ifdef CONFIG_EPOLL */
 	struct address_space	*f_mapping;
+#ifdef CONFIG_DEBUG_WRITECOUNT
+	unsigned long f_mnt_write_state;
+#endif
 };
 extern spinlock_t files_lock;
 #define file_list_lock() spin_lock(&files_lock);
@@ -818,6 +824,49 @@ extern spinlock_t files_lock;
 #define get_file(x)	atomic_inc(&(x)->f_count)
 #define file_count(x)	atomic_read(&(x)->f_count)
 
+#ifdef CONFIG_DEBUG_WRITECOUNT
+static inline void file_take_write(struct file *f)
+{
+	WARN_ON(f->f_mnt_write_state != 0);
+	f->f_mnt_write_state = FILE_MNT_WRITE_TAKEN;
+}
+static inline void file_release_write(struct file *f)
+{
+	f->f_mnt_write_state |= FILE_MNT_WRITE_RELEASED;
+}
+static inline void file_reset_write(struct file *f)
+{
+	f->f_mnt_write_state = 0;
+}
+static inline void file_check_state(struct file *f)
+{
+	/*
+	 * At this point, either both or neither of these bits
+	 * should be set.
+	 */
+	WARN_ON(f->f_mnt_write_state == FILE_MNT_WRITE_TAKEN);
+	WARN_ON(f->f_mnt_write_state == FILE_MNT_WRITE_RELEASED);
+}
+static inline int file_check_writeable(struct file *f)
+{
+	if (f->f_mnt_write_state == FILE_MNT_WRITE_TAKEN)
+		return 0;
+	printk(KERN_WARNING "writeable file with no "
+			    "mnt_want_write()\n");
+	WARN_ON(1);
+	return -EINVAL;
+}
+#else /* !CONFIG_DEBUG_WRITECOUNT */
+static inline void file_take_write(struct file *filp) {}
+static inline void file_release_write(struct file *filp) {}
+static inline void file_reset_write(struct file *filp) {}
+static inline void file_check_state(struct file *filp) {}
+static inline int file_check_writeable(struct file *filp)
+{
+	return 0;
+}
+#endif /* CONFIG_DEBUG_WRITECOUNT */
+
 #define	MAX_NON_LFS	((1UL<<31) - 1)
 
 /* Page cache limit. The filesystems should put that into their s_maxbytes 
diff -puN lib/Kconfig.debug~keep-track-of-mnt_writer-state-of-struct-file lib/Kconfig.debug
--- linux-2.6.git/lib/Kconfig.debug~keep-track-of-mnt_writer-state-of-struct-file	2008-02-08 13:05:02.000000000 -0800
+++ linux-2.6.git-dave/lib/Kconfig.debug	2008-02-08 13:05:02.000000000 -0800
@@ -433,6 +433,16 @@ config DEBUG_VM
 
 	  If unsure, say N.
 
+config DEBUG_WRITECOUNT
+	bool "Debug filesystem writers count"
+	depends on DEBUG_KERNEL
+	help
+	  Enable this to catch wrong use of the writers count in struct
+	  vfsmount.  This will increase the size of each file struct by
+	  32 bits.
+
+	  If unsure, say N.
+
 config DEBUG_LIST
 	bool "Debug linked list manipulation"
 	depends on DEBUG_KERNEL
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 06/30] make open_namei() return a filp
  2008-02-08 22:26 ` [RFC][PATCH 06/30] make open_namei() return a filp Dave Hansen
@ 2008-02-09  5:09   ` Christoph Hellwig
  0 siblings, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2008-02-09  5:09 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, hch, viro, viro, miklos

On Fri, Feb 08, 2008 at 02:26:54PM -0800, Dave Hansen wrote:
> 
> open_namei() will, in the future, need to take mount write counts
> over its creation and truncation (via may_open()) operations.  It
> needs to keep these write counts until any potential filp that is
> created gets __fput()'d.
> 
> This gets complicated in the error handling and becomes very murky
> as to how far open_namei() actually got, and whether or not that
> mount write count was taken.
> 
> Creating the filps inside of open_namei() lets us shift the write
> count to be taken and released along with the filp.  We can hold
> a temporary write count during those creation and truncation
> operations, then release them once the write for the filp has
> been established.
> 
> Any caller who gets a 'struct file' back must consider that filp
> instantiated and fput() it normally.  The callers no longer
> have to worry about ever manually releasing a mnt write count.

This patch description is not correct anymore, open_namei is actually
gone with this patch as it got merged into do_filp_open.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 00/30] Read-only bind mounts (-mm resend)
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (29 preceding siblings ...)
  2008-02-08 22:27 ` [RFC][PATCH 30/30] r/o bind mounts: debugging for missed calls Dave Hansen
@ 2008-02-09  6:39 ` Christoph Hellwig
  2008-02-09  7:57 ` Al Viro
  2008-02-12  5:06 ` Christoph Hellwig
  32 siblings, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2008-02-09  6:39 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, hch, viro, viro, miklos

On Fri, Feb 08, 2008 at 02:26:41PM -0800, Dave Hansen wrote:
> This is against current Linus git
> (a4ffc0a0b240a29cbe489f6db9dae112a49ef1c1).
> 
> This rolls up all the -mm bugfixes that were accumulated, and
> addresses some new review comments from Al.  Also contains some
> reworking from hch and a patch from Jeff Dike.
> 
> Just posting here to let everyone have a sniff before we resend
> it back to -mm.

Except for the comment snafu in patch 8 this looks good to me.
Feel free to add


Signed-off-by: Christoph Hellwig <hch@lst.de>

as after all the reshuffling and tweaking I should share my fair share
of blame if things go wrong after all.

Do we really want to do another detour to -mm?  I think we should be
heading towards Linus' tree ASAP with this.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 00/30] Read-only bind mounts (-mm resend)
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (30 preceding siblings ...)
  2008-02-09  6:39 ` [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Christoph Hellwig
@ 2008-02-09  7:57 ` Al Viro
  2008-02-12  5:06 ` Christoph Hellwig
  32 siblings, 0 replies; 35+ messages in thread
From: Al Viro @ 2008-02-09  7:57 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, hch, viro, miklos

On Fri, Feb 08, 2008 at 02:26:41PM -0800, Dave Hansen wrote:
> This is against current Linus git
> (a4ffc0a0b240a29cbe489f6db9dae112a49ef1c1).
> 
> This rolls up all the -mm bugfixes that were accumulated, and
> addresses some new review comments from Al.  Also contains some
> reworking from hch and a patch from Jeff Dike.
> 
> Just posting here to let everyone have a sniff before we resend
> it back to -mm.

ACKed.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC][PATCH 00/30] Read-only bind mounts (-mm resend)
  2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
                   ` (31 preceding siblings ...)
  2008-02-09  7:57 ` Al Viro
@ 2008-02-12  5:06 ` Christoph Hellwig
  32 siblings, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2008-02-12  5:06 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, hch, viro, viro, miklos

On Fri, Feb 08, 2008 at 02:26:41PM -0800, Dave Hansen wrote:
> This is against current Linus git
> (a4ffc0a0b240a29cbe489f6db9dae112a49ef1c1).
> 
> This rolls up all the -mm bugfixes that were accumulated, and
> addresses some new review comments from Al.  Also contains some
> reworking from hch and a patch from Jeff Dike.

Dave, you you please send this on to Linus ASAP?  It's getting really
late in the .25 circle.


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2008-02-12  5:06 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-08 22:26 [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Dave Hansen
2008-02-08 22:26 ` [RFC][PATCH 01/30] reiserfs: eliminate private use of struct file in xattr Dave Hansen
2008-02-08 22:26 ` [RFC][PATCH 02/30] hppfs pass vfsmount to dentry_open() Dave Hansen
2008-02-08 22:26 ` [RFC][PATCH 03/30] check for null vfsmount in dentry_open() Dave Hansen
2008-02-08 22:26 ` [RFC][PATCH 04/30] fix up new filp allocators Dave Hansen
2008-02-08 22:26 ` [RFC][PATCH 05/30] do namei_flags calculation inside open_namei() Dave Hansen
2008-02-08 22:26 ` [RFC][PATCH 06/30] make open_namei() return a filp Dave Hansen
2008-02-09  5:09   ` Christoph Hellwig
2008-02-08 22:26 ` [RFC][PATCH 07/30] r/o bind mounts: stub functions Dave Hansen
2008-02-08 22:26 ` [RFC][PATCH 08/30] r/o bind mounts: create helper to drop file write access Dave Hansen
2008-02-08 22:26 ` [RFC][PATCH 09/30] r/o bind mounts: drop write during emergency remount Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 10/30] r/o bind mounts: elevate write count for vfs_rmdir() Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 11/30] r/o bind mounts: elevate write count for callers of vfs_mkdir() Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 12/30] r/o bind mounts: elevate mnt_writers for unlink callers Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 13/30] r/o bind mounts: elevate write count for xattr_permission() callers Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 14/30] r/o bind mounts: elevate write count for ncp_ioctl() Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 15/30] r/o bind mounts: write counts for time functions Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 16/30] r/o bind mounts: elevate write count for do_utimes() Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 17/30] r/o bind mounts: write count for file_update_time() Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 18/30] r/o bind mounts: write counts for link/symlink Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 19/30] r/o bind mounts: elevate write count for ioctls() Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 20/30] r/o bind mounts: elevate write count for open()s Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 21/30] r/o bind mounts: get write access for vfs_rename() callers Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 22/30] r/o bind mounts: elevate write count for chmod/chown callers Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 23/30] r/o bind mounts: write counts for truncate() Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 24/30] r/o bind mounts: elevate count for xfs timestamp updates Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 25/30] r/o bind mounts: make access() use new r/o helper Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 26/30] r/o bind mounts: check mnt instead of superblock directly Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 27/30] r/o bind mounts: get callers of vfs_mknod/create() Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 28/30] r/o bind mounts: track numbers of writers to mounts Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 29/30] r/o bind mounts: honor mount writer counts at remount Dave Hansen
2008-02-08 22:27 ` [RFC][PATCH 30/30] r/o bind mounts: debugging for missed calls Dave Hansen
2008-02-09  6:39 ` [RFC][PATCH 00/30] Read-only bind mounts (-mm resend) Christoph Hellwig
2008-02-09  7:57 ` Al Viro
2008-02-12  5:06 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).