linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] nfsd: make is works in a container
@ 2012-12-06 15:34 Stanislav Kinsbursky
  2012-12-06 15:34 ` [PATCH 1/6] nfsd: pass proper net to nfsd_destroy() from NFSd kthreads Stanislav Kinsbursky
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Stanislav Kinsbursky @ 2012-12-06 15:34 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-kernel, devel

This patch set finally enables NFSd in container.
I've tested it in container with it's own root, and also pid, net and mount
namespaces.

There are some limitations, which are listed below:
1) only nfsdclt client tracker supported for container. It's deprecated and
going to be removed soon. UMH tracker requires switching root. Legacy tracker
requires something like RB tree of opened inodes to make sure, that any
recovery directory will be opened only once.
2) Enabled versions are controlled globally, which is should be fixed.
3) Server should be stopped by writing "0" to
/proc/fs/nfsd/threads instead of sending signals to NFSd threads (they are
working in init_pid). Sending signals will either won't work if container wich
its own pid namespace, or will kill all nfsd threads for all containers in
init_pid namesapce.
4) Currently, if container was stopped without stopping NFS server (i.e. it's
init was killed), NFSd kthreads will remain running. One of possible solutions
is to not hold network by NFSd service sockets, but register oer-net callback
and kill all the threads on network namespace exit.
5) NFSd filesystem superblock holds network namespace. I.e. if some process
will hold container's NFSd supeblock, then sthe whole container's network
naemspace will stay alive even is container is destroyed already.

There may be more limitations, which are not clear to me yet.

The following series implements...

---

Stanislav Kinsbursky (6):
      nfsd: pass proper net to nfsd_destroy() from NFSd kthreads
      nfsd: swap fs root in NFSd kthreads
      nfsd: make containerise NFSd filesystem
      nfsd: use proper net while reading "exports" file
      nfsd: disable usermode helper client tracker in container
      nfsd: enable NFSv4 state in containers


 fs/nfsd/netns.h       |    1 +
 fs/nfsd/nfs4recover.c |    6 ++++
 fs/nfsd/nfs4state.c   |   10 ------
 fs/nfsd/nfsctl.c      |   77 +++++++++++++++++++++++++++++++++++++------------
 fs/nfsd/nfssvc.c      |   42 ++++++++++++++++++++++++---
 5 files changed, 102 insertions(+), 34 deletions(-)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/6] nfsd: pass proper net to nfsd_destroy() from NFSd kthreads
  2012-12-06 15:34 [PATCH 0/6] nfsd: make is works in a container Stanislav Kinsbursky
@ 2012-12-06 15:34 ` Stanislav Kinsbursky
  2012-12-06 15:34 ` [PATCH 2/6] nfsd: swap fs root in " Stanislav Kinsbursky
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Stanislav Kinsbursky @ 2012-12-06 15:34 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-kernel, devel

Since NFSd service is per-net now, we have to pass proper network context in
nfsd_shutdown() from NFSd kthreads.
The simlies way I found is to get proper net from one of transports with
permanent sockets.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
---
 fs/nfsd/nfssvc.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 2cfd9c6..cee62ab 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -541,6 +541,8 @@ static int
 nfsd(void *vrqstp)
 {
 	struct svc_rqst *rqstp = (struct svc_rqst *) vrqstp;
+	struct svc_xprt *perm_sock = list_entry(rqstp->rq_server->sv_permsocks.next, typeof(struct svc_xprt), xpt_list);
+	struct net *net = perm_sock->xpt_net;
 	int err;
 
 	/* Lock module and set up kernel thread */
@@ -605,7 +607,7 @@ out:
 	/* Release the thread */
 	svc_exit_thread(rqstp);
 
-	nfsd_destroy(&init_net);
+	nfsd_destroy(net);
 
 	/* Release module */
 	mutex_unlock(&nfsd_mutex);


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/6] nfsd: swap fs root in NFSd kthreads
  2012-12-06 15:34 [PATCH 0/6] nfsd: make is works in a container Stanislav Kinsbursky
  2012-12-06 15:34 ` [PATCH 1/6] nfsd: pass proper net to nfsd_destroy() from NFSd kthreads Stanislav Kinsbursky
@ 2012-12-06 15:34 ` Stanislav Kinsbursky
  2012-12-10 20:28   ` J. Bruce Fields
  2012-12-06 15:34 ` [PATCH 3/6] nfsd: make containerise NFSd filesystem Stanislav Kinsbursky
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Stanislav Kinsbursky @ 2012-12-06 15:34 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-kernel, devel

NFSd does lookup. Lookup is done starting from current->fs->root.
NFSd is a kthread, cloned by kthreadd, and thus have global (but luckely
unshared) root.
So we have to swap root to those, which process, started NFSd, has. Because
that process can be in a container with it's own root.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
---
 fs/nfsd/netns.h  |    1 +
 fs/nfsd/nfssvc.c |   33 ++++++++++++++++++++++++++++++++-
 2 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index abfc97c..5c423c6 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -101,6 +101,7 @@ struct nfsd_net {
 	struct timeval nfssvc_boot;
 
 	struct svc_serv *nfsd_serv;
+	struct path	root;
 };
 
 extern int nfsd_net_id;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index cee62ab..177bb60 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -392,6 +392,7 @@ int nfsd_create_serv(struct net *net)
 
 	set_max_drc();
 	do_gettimeofday(&nn->nfssvc_boot);		/* record boot time */
+	get_fs_root(current->fs, &nn->root);
 	return 0;
 }
 
@@ -426,8 +427,10 @@ void nfsd_destroy(struct net *net)
 	if (destroy)
 		svc_shutdown_net(nn->nfsd_serv, net);
 	svc_destroy(nn->nfsd_serv);
-	if (destroy)
+	if (destroy) {
+		path_put(&nn->root);
 		nn->nfsd_serv = NULL;
+	}
 }
 
 int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
@@ -533,6 +536,25 @@ out:
 	return error;
 }
 
+/*
+ * This function is actually slightly modified set_fs_root()<F9>
+ */
+static void nfsd_swap_root(struct net *net)
+{
+	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+	struct fs_struct *fs = current->fs;
+	struct path old_root;
+
+	path_get(&nn->root);
+	spin_lock(&fs->lock);
+	write_seqcount_begin(&fs->seq);
+	old_root = fs->root;
+	fs->root = nn->root;
+	write_seqcount_end(&fs->seq);
+	spin_unlock(&fs->lock);
+	if (old_root.dentry)
+		path_put(&old_root);
+}
 
 /*
  * This is the NFS server kernel thread
@@ -559,6 +581,15 @@ nfsd(void *vrqstp)
 	current->fs->umask = 0;
 
 	/*
+	 * We have to swap NFSd kthread's fs->root.
+	 * Why so? Because NFSd can be started in container, which has it's own
+	 * root.
+	 * And so what? NFSd lookup files, and lookup start from
+	 * current->fs->root.
+	 */
+	nfsd_swap_root(net);
+
+	/*
 	 * thread is spawned with all signals set to SIG_IGN, re-enable
 	 * the ones that will bring down the thread
 	 */


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/6] nfsd: make containerise NFSd filesystem
  2012-12-06 15:34 [PATCH 0/6] nfsd: make is works in a container Stanislav Kinsbursky
  2012-12-06 15:34 ` [PATCH 1/6] nfsd: pass proper net to nfsd_destroy() from NFSd kthreads Stanislav Kinsbursky
  2012-12-06 15:34 ` [PATCH 2/6] nfsd: swap fs root in " Stanislav Kinsbursky
@ 2012-12-06 15:34 ` Stanislav Kinsbursky
  2012-12-06 15:34 ` [PATCH 4/6] nfsd: use proper net while reading "exports" file Stanislav Kinsbursky
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Stanislav Kinsbursky @ 2012-12-06 15:34 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-kernel, devel

This patch makes NFSD file system superblock to be created per net.
This makes possible to get proper network namespace from superblock instead of
using hard-coded "init_net".

Note: NFSd fs super-block holds network namespace. This garantees, that
network namespace won't disappear from underneath of it.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
---
 fs/nfsd/nfsctl.c |   46 +++++++++++++++++++++++++++++++++-------------
 fs/nfsd/nfssvc.c |    5 ++---
 2 files changed, 35 insertions(+), 16 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 7493428..873438f 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -220,6 +220,7 @@ static ssize_t write_unlock_ip(struct file *file, char *buf, size_t size)
 	struct sockaddr *sap = (struct sockaddr *)&address;
 	size_t salen = sizeof(address);
 	char *fo_path;
+	struct net *net = file->f_dentry->d_sb->s_fs_info;
 
 	/* sanity check */
 	if (size == 0)
@@ -232,7 +233,7 @@ static ssize_t write_unlock_ip(struct file *file, char *buf, size_t size)
 	if (qword_get(&buf, fo_path, size) < 0)
 		return -EINVAL;
 
-	if (rpc_pton(&init_net, fo_path, size, sap, salen) == 0)
+	if (rpc_pton(net, fo_path, size, sap, salen) == 0)
 		return -EINVAL;
 
 	return nlmsvc_unlock_all_by_ip(sap);
@@ -317,6 +318,7 @@ static ssize_t write_filehandle(struct file *file, char *buf, size_t size)
 	int len;
 	struct auth_domain *dom;
 	struct knfsd_fh fh;
+	struct net *net = file->f_dentry->d_sb->s_fs_info;
 
 	if (size == 0)
 		return -EINVAL;
@@ -352,7 +354,7 @@ static ssize_t write_filehandle(struct file *file, char *buf, size_t size)
 	if (!dom)
 		return -ENOMEM;
 
-	len = exp_rootfh(&init_net, dom, path, &fh,  maxsize);
+	len = exp_rootfh(net, dom, path, &fh,  maxsize);
 	auth_domain_put(dom);
 	if (len)
 		return len;
@@ -396,7 +398,7 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size)
 {
 	char *mesg = buf;
 	int rv;
-	struct net *net = &init_net;
+	struct net *net = file->f_dentry->d_sb->s_fs_info;
 
 	if (size > 0) {
 		int newthreads;
@@ -447,7 +449,7 @@ static ssize_t write_pool_threads(struct file *file, char *buf, size_t size)
 	int len;
 	int npools;
 	int *nthreads;
-	struct net *net = &init_net;
+	struct net *net = file->f_dentry->d_sb->s_fs_info;
 
 	mutex_lock(&nfsd_mutex);
 	npools = nfsd_nrpools(net);
@@ -510,7 +512,7 @@ static ssize_t __write_versions(struct file *file, char *buf, size_t size)
 	unsigned minor;
 	ssize_t tlen = 0;
 	char *sep;
-	struct net *net = &init_net;
+	struct net *net = file->f_dentry->d_sb->s_fs_info;
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 
 	if (size>0) {
@@ -792,7 +794,7 @@ static ssize_t __write_ports(struct file *file, char *buf, size_t size,
 static ssize_t write_ports(struct file *file, char *buf, size_t size)
 {
 	ssize_t rv;
-	struct net *net = &init_net;
+	struct net *net = file->f_dentry->d_sb->s_fs_info;
 
 	mutex_lock(&nfsd_mutex);
 	rv = __write_ports(file, buf, size, net);
@@ -827,7 +829,7 @@ int nfsd_max_blksize;
 static ssize_t write_maxblksize(struct file *file, char *buf, size_t size)
 {
 	char *mesg = buf;
-	struct net *net = &init_net;
+	struct net *net = file->f_dentry->d_sb->s_fs_info;
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 
 	if (size > 0) {
@@ -923,7 +925,8 @@ static ssize_t nfsd4_write_time(struct file *file, char *buf, size_t size,
  */
 static ssize_t write_leasetime(struct file *file, char *buf, size_t size)
 {
-	struct nfsd_net *nn = net_generic(&init_net, nfsd_net_id);
+	struct net *net = file->f_dentry->d_sb->s_fs_info;
+	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 	return nfsd4_write_time(file, buf, size, &nn->nfsd4_lease, nn);
 }
 
@@ -939,7 +942,8 @@ static ssize_t write_leasetime(struct file *file, char *buf, size_t size)
  */
 static ssize_t write_gracetime(struct file *file, char *buf, size_t size)
 {
-	struct nfsd_net *nn = net_generic(&init_net, nfsd_net_id);
+	struct net *net = file->f_dentry->d_sb->s_fs_info;
+	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 	return nfsd4_write_time(file, buf, size, &nn->nfsd4_grace, nn);
 }
 
@@ -995,7 +999,8 @@ static ssize_t __write_recoverydir(struct file *file, char *buf, size_t size,
 static ssize_t write_recoverydir(struct file *file, char *buf, size_t size)
 {
 	ssize_t rv;
-	struct nfsd_net *nn = net_generic(&init_net, nfsd_net_id);
+	struct net *net = file->f_dentry->d_sb->s_fs_info;
+	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 
 	mutex_lock(&nfsd_mutex);
 	rv = __write_recoverydir(file, buf, size, nn);
@@ -1037,20 +1042,35 @@ static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
 #endif
 		/* last one */ {""}
 	};
-	return simple_fill_super(sb, 0x6e667364, nfsd_files);
+	struct net *net = data;
+	int ret;
+
+	ret = simple_fill_super(sb, 0x6e667364, nfsd_files);
+	if (ret)
+		return ret;
+	sb->s_fs_info = get_net(net);
+	return 0;
 }
 
 static struct dentry *nfsd_mount(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data)
 {
-	return mount_single(fs_type, flags, data, nfsd_fill_super);
+	return mount_ns(fs_type, flags, current->nsproxy->net_ns, nfsd_fill_super);
+}
+
+static void nfsd_umount(struct super_block *sb)
+{
+	struct net *net = sb->s_fs_info;
+
+	kill_litter_super(sb);
+	put_net(net);
 }
 
 static struct file_system_type nfsd_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "nfsd",
 	.mount		= nfsd_mount,
-	.kill_sb	= kill_litter_super,
+	.kill_sb	= nfsd_umount,
 };
 
 #ifdef CONFIG_PROC_FS
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 177bb60..bf1c4d3 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -734,8 +734,7 @@ nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp)
 int nfsd_pool_stats_open(struct inode *inode, struct file *file)
 {
 	int ret;
-	struct net *net = &init_net;
-	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+	struct nfsd_net *nn = net_generic(inode->i_sb->s_fs_info, nfsd_net_id);
 
 	mutex_lock(&nfsd_mutex);
 	if (nn->nfsd_serv == NULL) {
@@ -752,7 +751,7 @@ int nfsd_pool_stats_open(struct inode *inode, struct file *file)
 int nfsd_pool_stats_release(struct inode *inode, struct file *file)
 {
 	int ret = seq_release(inode, file);
-	struct net *net = &init_net;
+	struct net *net = inode->i_sb->s_fs_info;
 
 	mutex_lock(&nfsd_mutex);
 	/* this function really, really should have been called svc_put() */


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/6] nfsd: use proper net while reading "exports" file
  2012-12-06 15:34 [PATCH 0/6] nfsd: make is works in a container Stanislav Kinsbursky
                   ` (2 preceding siblings ...)
  2012-12-06 15:34 ` [PATCH 3/6] nfsd: make containerise NFSd filesystem Stanislav Kinsbursky
@ 2012-12-06 15:34 ` Stanislav Kinsbursky
  2012-12-06 15:35 ` [PATCH 5/6] nfsd: disable usermode helper client tracker in container Stanislav Kinsbursky
  2012-12-06 15:35 ` [PATCH 6/6] nfsd: enable NFSv4 state in containers Stanislav Kinsbursky
  5 siblings, 0 replies; 24+ messages in thread
From: Stanislav Kinsbursky @ 2012-12-06 15:34 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-kernel, devel

Functuon "exports_open" is used for both "/proc/fs/nfs/exports" and
"/proc/fs/nfsd/exports" files.
Now NFSd filesystem is containerised, so proper net can be taken from
superblock for "/proc/fs/nfsd/exports" reader.
But for "/proc/fs/nfsd/exports" only current->nsproxy->net_ns can be used.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
---
 fs/nfsd/nfsctl.c |   31 +++++++++++++++++++++++++------
 1 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 873438f..c36510e 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -125,11 +125,11 @@ static const struct file_operations transaction_ops = {
 	.llseek		= default_llseek,
 };
 
-static int exports_open(struct inode *inode, struct file *file)
+static int exports_net_open(struct net *net, struct file *file)
 {
 	int err;
 	struct seq_file *seq;
-	struct nfsd_net *nn = net_generic(&init_net, nfsd_net_id);
+	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 
 	err = seq_open(file, &nfs_exports_op);
 	if (err)
@@ -140,8 +140,26 @@ static int exports_open(struct inode *inode, struct file *file)
 	return 0;
 }
 
-static const struct file_operations exports_operations = {
-	.open		= exports_open,
+static int exports_proc_open(struct inode *inode, struct file *file)
+{
+	return exports_net_open(current->nsproxy->net_ns, file);
+}
+
+static const struct file_operations exports_proc_operations = {
+	.open		= exports_proc_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release,
+	.owner		= THIS_MODULE,
+};
+
+static int exports_nfsd_open(struct inode *inode, struct file *file)
+{
+	return exports_net_open(inode->i_sb->s_fs_info, file);
+}
+
+static const struct file_operations exports_nfsd_operations = {
+	.open		= exports_nfsd_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
 	.release	= seq_release,
@@ -1018,7 +1036,7 @@ static ssize_t write_recoverydir(struct file *file, char *buf, size_t size)
 static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
 {
 	static struct tree_descr nfsd_files[] = {
-		[NFSD_List] = {"exports", &exports_operations, S_IRUGO},
+		[NFSD_List] = {"exports", &exports_nfsd_operations, S_IRUGO},
 		[NFSD_Export_features] = {"export_features",
 					&export_features_operations, S_IRUGO},
 		[NFSD_FO_UnlockIP] = {"unlock_ip",
@@ -1081,7 +1099,8 @@ static int create_proc_exports_entry(void)
 	entry = proc_mkdir("fs/nfs", NULL);
 	if (!entry)
 		return -ENOMEM;
-	entry = proc_create("exports", 0, entry, &exports_operations);
+	entry = proc_create("exports", 0, entry,
+				 &exports_proc_operations);
 	if (!entry)
 		return -ENOMEM;
 	return 0;


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/6] nfsd: disable usermode helper client tracker in container
  2012-12-06 15:34 [PATCH 0/6] nfsd: make is works in a container Stanislav Kinsbursky
                   ` (3 preceding siblings ...)
  2012-12-06 15:34 ` [PATCH 4/6] nfsd: use proper net while reading "exports" file Stanislav Kinsbursky
@ 2012-12-06 15:35 ` Stanislav Kinsbursky
  2012-12-06 15:35 ` [PATCH 6/6] nfsd: enable NFSv4 state in containers Stanislav Kinsbursky
  5 siblings, 0 replies; 24+ messages in thread
From: Stanislav Kinsbursky @ 2012-12-06 15:35 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-kernel, devel

This tracker uses khelper kthread to execute binaries.
Execution itself is done from kthread context - i.e. global root is used.
This is not suitable for containers with own root.
So, disable this tracker for a while.

Note: one of possible solutions can be pass "init" callback to khelper, which
will swap root to desired one.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
---
 fs/nfsd/nfs4recover.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index ba6fdd4..e0ae1cf 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -1185,6 +1185,12 @@ bin_to_hex_dup(const unsigned char *src, int srclen)
 static int
 nfsd4_umh_cltrack_init(struct net __attribute__((unused)) *net)
 {
+	/* XXX: The usermode helper s not working in container yet. */
+	if (net != &init_net) {
+		WARN(1, KERN_ERR "NFSD: attempt to initialize umh client "
+			"tracking in a container!\n");
+		return -EINVAL;
+	}
 	return nfsd4_umh_cltrack_upcall("init", NULL, NULL);
 }
 


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/6] nfsd: enable NFSv4 state in containers
  2012-12-06 15:34 [PATCH 0/6] nfsd: make is works in a container Stanislav Kinsbursky
                   ` (4 preceding siblings ...)
  2012-12-06 15:35 ` [PATCH 5/6] nfsd: disable usermode helper client tracker in container Stanislav Kinsbursky
@ 2012-12-06 15:35 ` Stanislav Kinsbursky
  5 siblings, 0 replies; 24+ messages in thread
From: Stanislav Kinsbursky @ 2012-12-06 15:35 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-kernel, devel

Currently, NFSd is ready to operate in network namespace based containers.
So let's drop check for "init_net" and make it able to fly.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
---
 fs/nfsd/nfs4state.c |   10 ----------
 1 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 39605c1..b3b6daa 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4866,16 +4866,6 @@ nfs4_state_start_net(struct net *net)
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 	int ret;
 
-	/*
-	 * FIXME: For now, we hang most of the pernet global stuff off of
-	 * init_net until nfsd is fully containerized. Eventually, we'll
-	 * need to pass a net pointer into this function, take a reference
-	 * to that instead and then do most of the rest of this on a per-net
-	 * basis.
-	 */
-	if (net != &init_net)
-		return -EINVAL;
-
 	ret = nfs4_state_create_net(net);
 	if (ret)
 		return ret;


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/6] nfsd: swap fs root in NFSd kthreads
  2012-12-06 15:34 ` [PATCH 2/6] nfsd: swap fs root in " Stanislav Kinsbursky
@ 2012-12-10 20:28   ` J. Bruce Fields
  2012-12-11 14:00     ` Stanislav Kinsbursky
  0 siblings, 1 reply; 24+ messages in thread
From: J. Bruce Fields @ 2012-12-10 20:28 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: linux-nfs, linux-kernel, devel

On Thu, Dec 06, 2012 at 06:34:47PM +0300, Stanislav Kinsbursky wrote:
> NFSd does lookup. Lookup is done starting from current->fs->root.
> NFSd is a kthread, cloned by kthreadd, and thus have global (but luckely
> unshared) root.
> So we have to swap root to those, which process, started NFSd, has. Because
> that process can be in a container with it's own root.

This doesn't sound right to me.

Which lookups exactly do you see being done relative to
current->fs->root ?

--b.

> 
> Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
> ---
>  fs/nfsd/netns.h  |    1 +
>  fs/nfsd/nfssvc.c |   33 ++++++++++++++++++++++++++++++++-
>  2 files changed, 33 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> index abfc97c..5c423c6 100644
> --- a/fs/nfsd/netns.h
> +++ b/fs/nfsd/netns.h
> @@ -101,6 +101,7 @@ struct nfsd_net {
>  	struct timeval nfssvc_boot;
>  
>  	struct svc_serv *nfsd_serv;
> +	struct path	root;
>  };
>  
>  extern int nfsd_net_id;
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index cee62ab..177bb60 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -392,6 +392,7 @@ int nfsd_create_serv(struct net *net)
>  
>  	set_max_drc();
>  	do_gettimeofday(&nn->nfssvc_boot);		/* record boot time */
> +	get_fs_root(current->fs, &nn->root);
>  	return 0;
>  }
>  
> @@ -426,8 +427,10 @@ void nfsd_destroy(struct net *net)
>  	if (destroy)
>  		svc_shutdown_net(nn->nfsd_serv, net);
>  	svc_destroy(nn->nfsd_serv);
> -	if (destroy)
> +	if (destroy) {
> +		path_put(&nn->root);
>  		nn->nfsd_serv = NULL;
> +	}
>  }
>  
>  int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
> @@ -533,6 +536,25 @@ out:
>  	return error;
>  }
>  
> +/*
> + * This function is actually slightly modified set_fs_root()<F9>
> + */
> +static void nfsd_swap_root(struct net *net)
> +{
> +	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> +	struct fs_struct *fs = current->fs;
> +	struct path old_root;
> +
> +	path_get(&nn->root);
> +	spin_lock(&fs->lock);
> +	write_seqcount_begin(&fs->seq);
> +	old_root = fs->root;
> +	fs->root = nn->root;
> +	write_seqcount_end(&fs->seq);
> +	spin_unlock(&fs->lock);
> +	if (old_root.dentry)
> +		path_put(&old_root);
> +}
>  
>  /*
>   * This is the NFS server kernel thread
> @@ -559,6 +581,15 @@ nfsd(void *vrqstp)
>  	current->fs->umask = 0;
>  
>  	/*
> +	 * We have to swap NFSd kthread's fs->root.
> +	 * Why so? Because NFSd can be started in container, which has it's own
> +	 * root.
> +	 * And so what? NFSd lookup files, and lookup start from
> +	 * current->fs->root.
> +	 */
> +	nfsd_swap_root(net);
> +
> +	/*
>  	 * thread is spawned with all signals set to SIG_IGN, re-enable
>  	 * the ones that will bring down the thread
>  	 */
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/6] nfsd: swap fs root in NFSd kthreads
  2012-12-10 20:28   ` J. Bruce Fields
@ 2012-12-11 14:00     ` Stanislav Kinsbursky
  2012-12-11 14:12       ` [Devel] " Stanislav Kinsbursky
  2012-12-11 14:54       ` Al Viro
  0 siblings, 2 replies; 24+ messages in thread
From: Stanislav Kinsbursky @ 2012-12-11 14:00 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, linux-kernel, devel

11.12.2012 00:28, J. Bruce Fields пишет:
> On Thu, Dec 06, 2012 at 06:34:47PM +0300, Stanislav Kinsbursky wrote:
>> NFSd does lookup. Lookup is done starting from current->fs->root.
>> NFSd is a kthread, cloned by kthreadd, and thus have global (but luckely
>> unshared) root.
>> So we have to swap root to those, which process, started NFSd, has. Because
>> that process can be in a container with it's own root.
>
> This doesn't sound right to me.
>
> Which lookups exactly do you see being done relative to
> current->fs->root ?
>

Ok, you are right. I was mistaken here.
This is not a exactly lookup, but d_path() problem in svc_export_request().
I.e. without root swapping, d_path() will give not local export path (like "/export")
but something like this "/root/containers_root/export".

> --b.
>
>>
>> Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
>> ---
>>   fs/nfsd/netns.h  |    1 +
>>   fs/nfsd/nfssvc.c |   33 ++++++++++++++++++++++++++++++++-
>>   2 files changed, 33 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
>> index abfc97c..5c423c6 100644
>> --- a/fs/nfsd/netns.h
>> +++ b/fs/nfsd/netns.h
>> @@ -101,6 +101,7 @@ struct nfsd_net {
>>   	struct timeval nfssvc_boot;
>>
>>   	struct svc_serv *nfsd_serv;
>> +	struct path	root;
>>   };
>>
>>   extern int nfsd_net_id;
>> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
>> index cee62ab..177bb60 100644
>> --- a/fs/nfsd/nfssvc.c
>> +++ b/fs/nfsd/nfssvc.c
>> @@ -392,6 +392,7 @@ int nfsd_create_serv(struct net *net)
>>
>>   	set_max_drc();
>>   	do_gettimeofday(&nn->nfssvc_boot);		/* record boot time */
>> +	get_fs_root(current->fs, &nn->root);
>>   	return 0;
>>   }
>>
>> @@ -426,8 +427,10 @@ void nfsd_destroy(struct net *net)
>>   	if (destroy)
>>   		svc_shutdown_net(nn->nfsd_serv, net);
>>   	svc_destroy(nn->nfsd_serv);
>> -	if (destroy)
>> +	if (destroy) {
>> +		path_put(&nn->root);
>>   		nn->nfsd_serv = NULL;
>> +	}
>>   }
>>
>>   int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
>> @@ -533,6 +536,25 @@ out:
>>   	return error;
>>   }
>>
>> +/*
>> + * This function is actually slightly modified set_fs_root()<F9>
>> + */
>> +static void nfsd_swap_root(struct net *net)
>> +{
>> +	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>> +	struct fs_struct *fs = current->fs;
>> +	struct path old_root;
>> +
>> +	path_get(&nn->root);
>> +	spin_lock(&fs->lock);
>> +	write_seqcount_begin(&fs->seq);
>> +	old_root = fs->root;
>> +	fs->root = nn->root;
>> +	write_seqcount_end(&fs->seq);
>> +	spin_unlock(&fs->lock);
>> +	if (old_root.dentry)
>> +		path_put(&old_root);
>> +}
>>
>>   /*
>>    * This is the NFS server kernel thread
>> @@ -559,6 +581,15 @@ nfsd(void *vrqstp)
>>   	current->fs->umask = 0;
>>
>>   	/*
>> +	 * We have to swap NFSd kthread's fs->root.
>> +	 * Why so? Because NFSd can be started in container, which has it's own
>> +	 * root.
>> +	 * And so what? NFSd lookup files, and lookup start from
>> +	 * current->fs->root.
>> +	 */
>> +	nfsd_swap_root(net);
>> +
>> +	/*
>>   	 * thread is spawned with all signals set to SIG_IGN, re-enable
>>   	 * the ones that will bring down the thread
>>   	 */
>>


-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Devel] [PATCH 2/6] nfsd: swap fs root in NFSd kthreads
  2012-12-11 14:00     ` Stanislav Kinsbursky
@ 2012-12-11 14:12       ` Stanislav Kinsbursky
  2012-12-11 14:51         ` Stanislav Kinsbursky
  2012-12-11 14:56         ` J. Bruce Fields
  2012-12-11 14:54       ` Al Viro
  1 sibling, 2 replies; 24+ messages in thread
From: Stanislav Kinsbursky @ 2012-12-11 14:12 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, linux-kernel, devel

11.12.2012 18:00, Stanislav Kinsbursky пишет:
> 11.12.2012 00:28, J. Bruce Fields пишет:
>> On Thu, Dec 06, 2012 at 06:34:47PM +0300, Stanislav Kinsbursky wrote:
>>> NFSd does lookup. Lookup is done starting from current->fs->root.
>>> NFSd is a kthread, cloned by kthreadd, and thus have global (but luckely
>>> unshared) root.
>>> So we have to swap root to those, which process, started NFSd, has. Because
>>> that process can be in a container with it's own root.
>>
>> This doesn't sound right to me.
>>
>> Which lookups exactly do you see being done relative to
>> current->fs->root ?
>>
>
> Ok, you are right. I was mistaken here.
> This is not a exactly lookup, but d_path() problem in svc_export_request().
> I.e. without root swapping, d_path() will give not local export path (like "/export")
> but something like this "/root/containers_root/export".
>

We, actually, can do it less in less aggressive way.
I.e. instead root swap and current svc_export_request() implementation:

void svc_export_request(...)
{
	<snip>
         pth = d_path(&exp->ex_path, *bpp, *blen);
	<snip>
}

we can do something like this:

void svc_export_request(...)
{
	struct nfsd_net *nn = ...
	<snip>
	spin_lock(&dcache_lock);
         pth = __d_path(&exp->ex_path, &nn->root, *bpp, *blen);
	spin_unlock(&dcache_lock);
	<snip>
}

>> --b.
>>
>>>
>>> Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
>>> ---
>>>   fs/nfsd/netns.h  |    1 +
>>>   fs/nfsd/nfssvc.c |   33 ++++++++++++++++++++++++++++++++-
>>>   2 files changed, 33 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
>>> index abfc97c..5c423c6 100644
>>> --- a/fs/nfsd/netns.h
>>> +++ b/fs/nfsd/netns.h
>>> @@ -101,6 +101,7 @@ struct nfsd_net {
>>>       struct timeval nfssvc_boot;
>>>
>>>       struct svc_serv *nfsd_serv;
>>> +    struct path    root;
>>>   };
>>>
>>>   extern int nfsd_net_id;
>>> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
>>> index cee62ab..177bb60 100644
>>> --- a/fs/nfsd/nfssvc.c
>>> +++ b/fs/nfsd/nfssvc.c
>>> @@ -392,6 +392,7 @@ int nfsd_create_serv(struct net *net)
>>>
>>>       set_max_drc();
>>>       do_gettimeofday(&nn->nfssvc_boot);        /* record boot time */
>>> +    get_fs_root(current->fs, &nn->root);
>>>       return 0;
>>>   }
>>>
>>> @@ -426,8 +427,10 @@ void nfsd_destroy(struct net *net)
>>>       if (destroy)
>>>           svc_shutdown_net(nn->nfsd_serv, net);
>>>       svc_destroy(nn->nfsd_serv);
>>> -    if (destroy)
>>> +    if (destroy) {
>>> +        path_put(&nn->root);
>>>           nn->nfsd_serv = NULL;
>>> +    }
>>>   }
>>>
>>>   int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
>>> @@ -533,6 +536,25 @@ out:
>>>       return error;
>>>   }
>>>
>>> +/*
>>> + * This function is actually slightly modified set_fs_root()<F9>
>>> + */
>>> +static void nfsd_swap_root(struct net *net)
>>> +{
>>> +    struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>>> +    struct fs_struct *fs = current->fs;
>>> +    struct path old_root;
>>> +
>>> +    path_get(&nn->root);
>>> +    spin_lock(&fs->lock);
>>> +    write_seqcount_begin(&fs->seq);
>>> +    old_root = fs->root;
>>> +    fs->root = nn->root;
>>> +    write_seqcount_end(&fs->seq);
>>> +    spin_unlock(&fs->lock);
>>> +    if (old_root.dentry)
>>> +        path_put(&old_root);
>>> +}
>>>
>>>   /*
>>>    * This is the NFS server kernel thread
>>> @@ -559,6 +581,15 @@ nfsd(void *vrqstp)
>>>       current->fs->umask = 0;
>>>
>>>       /*
>>> +     * We have to swap NFSd kthread's fs->root.
>>> +     * Why so? Because NFSd can be started in container, which has it's own
>>> +     * root.
>>> +     * And so what? NFSd lookup files, and lookup start from
>>> +     * current->fs->root.
>>> +     */
>>> +    nfsd_swap_root(net);
>>> +
>>> +    /*
>>>        * thread is spawned with all signals set to SIG_IGN, re-enable
>>>        * the ones that will bring down the thread
>>>        */
>>>
>
>


-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Devel] [PATCH 2/6] nfsd: swap fs root in NFSd kthreads
  2012-12-11 14:12       ` [Devel] " Stanislav Kinsbursky
@ 2012-12-11 14:51         ` Stanislav Kinsbursky
  2012-12-11 14:56         ` J. Bruce Fields
  1 sibling, 0 replies; 24+ messages in thread
From: Stanislav Kinsbursky @ 2012-12-11 14:51 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, linux-kernel, devel

11.12.2012 18:12, Stanislav Kinsbursky пишет:
> 11.12.2012 18:00, Stanislav Kinsbursky пишет:
>> 11.12.2012 00:28, J. Bruce Fields пишет:
>>> On Thu, Dec 06, 2012 at 06:34:47PM +0300, Stanislav Kinsbursky wrote:
>>>> NFSd does lookup. Lookup is done starting from current->fs->root.
>>>> NFSd is a kthread, cloned by kthreadd, and thus have global (but luckely
>>>> unshared) root.
>>>> So we have to swap root to those, which process, started NFSd, has. Because
>>>> that process can be in a container with it's own root.
>>>
>>> This doesn't sound right to me.
>>>
>>> Which lookups exactly do you see being done relative to
>>> current->fs->root ?
>>>
>>
>> Ok, you are right. I was mistaken here.
>> This is not a exactly lookup, but d_path() problem in svc_export_request().
>> I.e. without root swapping, d_path() will give not local export path (like "/export")
>> but something like this "/root/containers_root/export".
>>
>
> We, actually, can do it less in less aggressive way.
> I.e. instead root swap and current svc_export_request() implementation:
>
> void svc_export_request(...)
> {
>      <snip>
>          pth = d_path(&exp->ex_path, *bpp, *blen);
>      <snip>
> }
>
> we can do something like this:
>
> void svc_export_request(...)
> {
>      struct nfsd_net *nn = ...
>      <snip>
>      spin_lock(&dcache_lock);
>          pth = __d_path(&exp->ex_path, &nn->root, *bpp, *blen);
>      spin_unlock(&dcache_lock);
>      <snip>
> }
>

No, this won't work. Sorry for noise.

>>> --b.
>>>
>>>>
>>>> Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
>>>> ---
>>>>   fs/nfsd/netns.h  |    1 +
>>>>   fs/nfsd/nfssvc.c |   33 ++++++++++++++++++++++++++++++++-
>>>>   2 files changed, 33 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
>>>> index abfc97c..5c423c6 100644
>>>> --- a/fs/nfsd/netns.h
>>>> +++ b/fs/nfsd/netns.h
>>>> @@ -101,6 +101,7 @@ struct nfsd_net {
>>>>       struct timeval nfssvc_boot;
>>>>
>>>>       struct svc_serv *nfsd_serv;
>>>> +    struct path    root;
>>>>   };
>>>>
>>>>   extern int nfsd_net_id;
>>>> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
>>>> index cee62ab..177bb60 100644
>>>> --- a/fs/nfsd/nfssvc.c
>>>> +++ b/fs/nfsd/nfssvc.c
>>>> @@ -392,6 +392,7 @@ int nfsd_create_serv(struct net *net)
>>>>
>>>>       set_max_drc();
>>>>       do_gettimeofday(&nn->nfssvc_boot);        /* record boot time */
>>>> +    get_fs_root(current->fs, &nn->root);
>>>>       return 0;
>>>>   }
>>>>
>>>> @@ -426,8 +427,10 @@ void nfsd_destroy(struct net *net)
>>>>       if (destroy)
>>>>           svc_shutdown_net(nn->nfsd_serv, net);
>>>>       svc_destroy(nn->nfsd_serv);
>>>> -    if (destroy)
>>>> +    if (destroy) {
>>>> +        path_put(&nn->root);
>>>>           nn->nfsd_serv = NULL;
>>>> +    }
>>>>   }
>>>>
>>>>   int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
>>>> @@ -533,6 +536,25 @@ out:
>>>>       return error;
>>>>   }
>>>>
>>>> +/*
>>>> + * This function is actually slightly modified set_fs_root()<F9>
>>>> + */
>>>> +static void nfsd_swap_root(struct net *net)
>>>> +{
>>>> +    struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>>>> +    struct fs_struct *fs = current->fs;
>>>> +    struct path old_root;
>>>> +
>>>> +    path_get(&nn->root);
>>>> +    spin_lock(&fs->lock);
>>>> +    write_seqcount_begin(&fs->seq);
>>>> +    old_root = fs->root;
>>>> +    fs->root = nn->root;
>>>> +    write_seqcount_end(&fs->seq);
>>>> +    spin_unlock(&fs->lock);
>>>> +    if (old_root.dentry)
>>>> +        path_put(&old_root);
>>>> +}
>>>>
>>>>   /*
>>>>    * This is the NFS server kernel thread
>>>> @@ -559,6 +581,15 @@ nfsd(void *vrqstp)
>>>>       current->fs->umask = 0;
>>>>
>>>>       /*
>>>> +     * We have to swap NFSd kthread's fs->root.
>>>> +     * Why so? Because NFSd can be started in container, which has it's own
>>>> +     * root.
>>>> +     * And so what? NFSd lookup files, and lookup start from
>>>> +     * current->fs->root.
>>>> +     */
>>>> +    nfsd_swap_root(net);
>>>> +
>>>> +    /*
>>>>        * thread is spawned with all signals set to SIG_IGN, re-enable
>>>>        * the ones that will bring down the thread
>>>>        */
>>>>
>>
>>
>
>


-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/6] nfsd: swap fs root in NFSd kthreads
  2012-12-11 14:00     ` Stanislav Kinsbursky
  2012-12-11 14:12       ` [Devel] " Stanislav Kinsbursky
@ 2012-12-11 14:54       ` Al Viro
  2012-12-11 14:57         ` Stanislav Kinsbursky
  1 sibling, 1 reply; 24+ messages in thread
From: Al Viro @ 2012-12-11 14:54 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: J. Bruce Fields, linux-nfs, linux-kernel, devel

On Tue, Dec 11, 2012 at 06:00:00PM +0400, Stanislav Kinsbursky wrote:
> 11.12.2012 00:28, J. Bruce Fields ??????????:
> >On Thu, Dec 06, 2012 at 06:34:47PM +0300, Stanislav Kinsbursky wrote:
> >>NFSd does lookup. Lookup is done starting from current->fs->root.
> >>NFSd is a kthread, cloned by kthreadd, and thus have global (but luckely
> >>unshared) root.
> >>So we have to swap root to those, which process, started NFSd, has. Because
> >>that process can be in a container with it's own root.
> >
> >This doesn't sound right to me.
> >
> >Which lookups exactly do you see being done relative to
> >current->fs->root ?
> >
> 
> Ok, you are right. I was mistaken here.
> This is not a exactly lookup, but d_path() problem in svc_export_request().
> I.e. without root swapping, d_path() will give not local export path (like "/export")
> but something like this "/root/containers_root/export".

Now, *that* is a different story (and makes some sense).  Take a look
at __d_path(), please.  You don't need to set ->fs->root to get d_path()
equivalent relative to given point.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Devel] [PATCH 2/6] nfsd: swap fs root in NFSd kthreads
  2012-12-11 14:12       ` [Devel] " Stanislav Kinsbursky
  2012-12-11 14:51         ` Stanislav Kinsbursky
@ 2012-12-11 14:56         ` J. Bruce Fields
  2012-12-11 14:58           ` Al Viro
  2012-12-11 15:07           ` Stanislav Kinsbursky
  1 sibling, 2 replies; 24+ messages in thread
From: J. Bruce Fields @ 2012-12-11 14:56 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: linux-nfs, linux-kernel, devel

On Tue, Dec 11, 2012 at 06:12:40PM +0400, Stanislav Kinsbursky wrote:
> UID: 9899
> 
> 11.12.2012 18:00, Stanislav Kinsbursky пишет:
> >11.12.2012 00:28, J. Bruce Fields пишет:
> >>On Thu, Dec 06, 2012 at 06:34:47PM +0300, Stanislav Kinsbursky wrote:
> >>>NFSd does lookup. Lookup is done starting from current->fs->root.
> >>>NFSd is a kthread, cloned by kthreadd, and thus have global (but luckely
> >>>unshared) root.
> >>>So we have to swap root to those, which process, started NFSd, has. Because
> >>>that process can be in a container with it's own root.
> >>
> >>This doesn't sound right to me.
> >>
> >>Which lookups exactly do you see being done relative to
> >>current->fs->root ?
> >>
> >
> >Ok, you are right. I was mistaken here.
> >This is not a exactly lookup, but d_path() problem in svc_export_request().
> >I.e. without root swapping, d_path() will give not local export path (like "/export")
> >but something like this "/root/containers_root/export".
> >
> 
> We, actually, can do it less in less aggressive way.
> I.e. instead root swap and current svc_export_request() implementation:
> 
> void svc_export_request(...)
> {
> 	<snip>
>         pth = d_path(&exp->ex_path, *bpp, *blen);
> 	<snip>
> }
> 
> we can do something like this:
> 
> void svc_export_request(...)
> {
> 	struct nfsd_net *nn = ...
> 	<snip>
> 	spin_lock(&dcache_lock);
>         pth = __d_path(&exp->ex_path, &nn->root, *bpp, *blen);
> 	spin_unlock(&dcache_lock);
> 	<snip>
> }

That looks simpler, but I still don't understand why we need it.

I'm confused about how d_path works; I would have thought that
filesystem namespaces would have their own vfsmount trees and hence that
the (vfsmount, dentry) would be enough to specify the path.  Is the root
argument for the case of chroot?  Do we care about that?

Also, svc_export_request is called from mountd's read of
/proc/net/rpc/nfsd.export/channel.  If mountd's root is wrong, then
nothing's going to work anyway.

--b.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/6] nfsd: swap fs root in NFSd kthreads
  2012-12-11 14:54       ` Al Viro
@ 2012-12-11 14:57         ` Stanislav Kinsbursky
  0 siblings, 0 replies; 24+ messages in thread
From: Stanislav Kinsbursky @ 2012-12-11 14:57 UTC (permalink / raw)
  To: Al Viro; +Cc: J. Bruce Fields, linux-nfs, linux-kernel, devel

11.12.2012 18:54, Al Viro пишет:
> On Tue, Dec 11, 2012 at 06:00:00PM +0400, Stanislav Kinsbursky wrote:
>> 11.12.2012 00:28, J. Bruce Fields ??????????:
>>> On Thu, Dec 06, 2012 at 06:34:47PM +0300, Stanislav Kinsbursky wrote:
>>>> NFSd does lookup. Lookup is done starting from current->fs->root.
>>>> NFSd is a kthread, cloned by kthreadd, and thus have global (but luckely
>>>> unshared) root.
>>>> So we have to swap root to those, which process, started NFSd, has. Because
>>>> that process can be in a container with it's own root.
>>>
>>> This doesn't sound right to me.
>>>
>>> Which lookups exactly do you see being done relative to
>>> current->fs->root ?
>>>
>>
>> Ok, you are right. I was mistaken here.
>> This is not a exactly lookup, but d_path() problem in svc_export_request().
>> I.e. without root swapping, d_path() will give not local export path (like "/export")
>> but something like this "/root/containers_root/export".
>
> Now, *that* is a different story (and makes some sense).  Take a look
> at __d_path(), please.  You don't need to set ->fs->root to get d_path()
> equivalent relative to given point.
>

Thanks, Al.
But __d_path() is not exported and this code is called from NFSd module.
Is it suitable for you to export __d_path()?

-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Devel] [PATCH 2/6] nfsd: swap fs root in NFSd kthreads
  2012-12-11 14:56         ` J. Bruce Fields
@ 2012-12-11 14:58           ` Al Viro
  2012-12-11 15:07           ` Stanislav Kinsbursky
  1 sibling, 0 replies; 24+ messages in thread
From: Al Viro @ 2012-12-11 14:58 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Stanislav Kinsbursky, linux-nfs, linux-kernel, devel

On Tue, Dec 11, 2012 at 09:56:21AM -0500, J. Bruce Fields wrote:

> That looks simpler, but I still don't understand why we need it.
> 
> I'm confused about how d_path works; I would have thought that
> filesystem namespaces would have their own vfsmount trees and hence that
> the (vfsmount, dentry) would be enough to specify the path.  Is the root
> argument for the case of chroot?  Do we care about that?

__d_path() is relative pathname from here to there.  Whether (and what for)
is it wanted in case of nfsd patches is a separate question...

> Also, svc_export_request is called from mountd's read of
> /proc/net/rpc/nfsd.export/channel.  If mountd's root is wrong, then
> nothing's going to work anyway.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Devel] [PATCH 2/6] nfsd: swap fs root in NFSd kthreads
  2012-12-11 14:56         ` J. Bruce Fields
  2012-12-11 14:58           ` Al Viro
@ 2012-12-11 15:07           ` Stanislav Kinsbursky
  2012-12-11 15:20             ` J. Bruce Fields
  1 sibling, 1 reply; 24+ messages in thread
From: Stanislav Kinsbursky @ 2012-12-11 15:07 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, linux-kernel, devel

11.12.2012 18:56, J. Bruce Fields пишет:
> On Tue, Dec 11, 2012 at 06:12:40PM +0400, Stanislav Kinsbursky wrote:
>> UID: 9899
>>
>> 11.12.2012 18:00, Stanislav Kinsbursky пишет:
>>> 11.12.2012 00:28, J. Bruce Fields пишет:
>>>> On Thu, Dec 06, 2012 at 06:34:47PM +0300, Stanislav Kinsbursky wrote:
>>>>> NFSd does lookup. Lookup is done starting from current->fs->root.
>>>>> NFSd is a kthread, cloned by kthreadd, and thus have global (but luckely
>>>>> unshared) root.
>>>>> So we have to swap root to those, which process, started NFSd, has. Because
>>>>> that process can be in a container with it's own root.
>>>>
>>>> This doesn't sound right to me.
>>>>
>>>> Which lookups exactly do you see being done relative to
>>>> current->fs->root ?
>>>>
>>>
>>> Ok, you are right. I was mistaken here.
>>> This is not a exactly lookup, but d_path() problem in svc_export_request().
>>> I.e. without root swapping, d_path() will give not local export path (like "/export")
>>> but something like this "/root/containers_root/export".
>>>
>>
>> We, actually, can do it less in less aggressive way.
>> I.e. instead root swap and current svc_export_request() implementation:
>>
>> void svc_export_request(...)
>> {
>> 	<snip>
>>          pth = d_path(&exp->ex_path, *bpp, *blen);
>> 	<snip>
>> }
>>
>> we can do something like this:
>>
>> void svc_export_request(...)
>> {
>> 	struct nfsd_net *nn = ...
>> 	<snip>
>> 	spin_lock(&dcache_lock);
>>          pth = __d_path(&exp->ex_path, &nn->root, *bpp, *blen);
>> 	spin_unlock(&dcache_lock);
>> 	<snip>
>> }
>
> That looks simpler, but I still don't understand why we need it.
>
> I'm confused about how d_path works; I would have thought that
> filesystem namespaces would have their own vfsmount trees and hence that
> the (vfsmount, dentry) would be enough to specify the path.  Is the root
> argument for the case of chroot?  Do we care about that?
>

It works very simple: just traverse the tree from specified dentry up to current->fs->root.dentry.
Having container in some fully separated mount point is great, of course. But:
1) this is a limitation we really want to avoid. I.e. container can be chrooted into some path like "/root/containers_root/" as in example above.
2) NFSd kthread works in init root environment. But we anyway want to get proper path string in container root, but not in kthreads root.

> Also, svc_export_request is called from mountd's read of
> /proc/net/rpc/nfsd.export/channel.  If mountd's root is wrong, then
> nothing's going to work anyway.
>

I don't really understand, how  mountd's root can be wrong. I.e. its' always right as I see it. NFSd kthreads have to swap/use relative path/whatever to 
communicate with proper mountd.
Or I'm missing something?

> --b.
>


-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Devel] [PATCH 2/6] nfsd: swap fs root in NFSd kthreads
  2012-12-11 15:07           ` Stanislav Kinsbursky
@ 2012-12-11 15:20             ` J. Bruce Fields
  2012-12-11 15:35               ` J. Bruce Fields
  0 siblings, 1 reply; 24+ messages in thread
From: J. Bruce Fields @ 2012-12-11 15:20 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: linux-nfs, linux-kernel, devel

On Tue, Dec 11, 2012 at 07:07:00PM +0400, Stanislav Kinsbursky wrote:
> 11.12.2012 18:56, J. Bruce Fields пишет:
> >On Tue, Dec 11, 2012 at 06:12:40PM +0400, Stanislav Kinsbursky wrote:
> >>UID: 9899
> >>
> >>11.12.2012 18:00, Stanislav Kinsbursky пишет:
> >>>11.12.2012 00:28, J. Bruce Fields пишет:
> >>>>On Thu, Dec 06, 2012 at 06:34:47PM +0300, Stanislav Kinsbursky wrote:
> >>>>>NFSd does lookup. Lookup is done starting from current->fs->root.
> >>>>>NFSd is a kthread, cloned by kthreadd, and thus have global (but luckely
> >>>>>unshared) root.
> >>>>>So we have to swap root to those, which process, started NFSd, has. Because
> >>>>>that process can be in a container with it's own root.
> >>>>
> >>>>This doesn't sound right to me.
> >>>>
> >>>>Which lookups exactly do you see being done relative to
> >>>>current->fs->root ?
> >>>>
> >>>
> >>>Ok, you are right. I was mistaken here.
> >>>This is not a exactly lookup, but d_path() problem in svc_export_request().
> >>>I.e. without root swapping, d_path() will give not local export path (like "/export")
> >>>but something like this "/root/containers_root/export".
> >>>
> >>
> >>We, actually, can do it less in less aggressive way.
> >>I.e. instead root swap and current svc_export_request() implementation:
> >>
> >>void svc_export_request(...)
> >>{
> >>	<snip>
> >>         pth = d_path(&exp->ex_path, *bpp, *blen);
> >>	<snip>
> >>}
> >>
> >>we can do something like this:
> >>
> >>void svc_export_request(...)
> >>{
> >>	struct nfsd_net *nn = ...
> >>	<snip>
> >>	spin_lock(&dcache_lock);
> >>         pth = __d_path(&exp->ex_path, &nn->root, *bpp, *blen);
> >>	spin_unlock(&dcache_lock);
> >>	<snip>
> >>}
> >
> >That looks simpler, but I still don't understand why we need it.
> >
> >I'm confused about how d_path works; I would have thought that
> >filesystem namespaces would have their own vfsmount trees and hence that
> >the (vfsmount, dentry) would be enough to specify the path.  Is the root
> >argument for the case of chroot?  Do we care about that?
> >
> 
> It works very simple: just traverse the tree from specified dentry up to current->fs->root.dentry.
> Having container in some fully separated mount point is great, of course. But:
> 1) this is a limitation we really want to avoid. I.e. container can be chrooted into some path like "/root/containers_root/" as in example above.
> 2) NFSd kthread works in init root environment. But we anyway want to get proper path string in container root, but not in kthreads root.
> 
> >Also, svc_export_request is called from mountd's read of
> >/proc/net/rpc/nfsd.export/channel.  If mountd's root is wrong, then
> >nothing's going to work anyway.
> >
> 
> I don't really understand, how  mountd's root can be wrong. I.e.
> its' always right as I see it. NFSd kthreads have to swap/use
> relative path/whatever to communicate with proper mountd.
> Or I'm missing something?

Ugh, I see the problem: I thought svc_export_request was called at the
time mountd does the read, but instead its done at the time nfsd does
the upcall.

I suspect that's wrong, and we really want this done in the context of
the mountd process when it does the read call.  If d_path is called
there then we have no problem.

--b.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Devel] [PATCH 2/6] nfsd: swap fs root in NFSd kthreads
  2012-12-11 15:20             ` J. Bruce Fields
@ 2012-12-11 15:35               ` J. Bruce Fields
  2012-12-12  7:45                 ` Stanislav Kinsbursky
  2013-01-11 14:56                 ` Stanislav Kinsbursky
  0 siblings, 2 replies; 24+ messages in thread
From: J. Bruce Fields @ 2012-12-11 15:35 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: linux-nfs, linux-kernel, devel

On Tue, Dec 11, 2012 at 10:20:36AM -0500, J. Bruce Fields wrote:
> On Tue, Dec 11, 2012 at 07:07:00PM +0400, Stanislav Kinsbursky wrote:
> > I don't really understand, how  mountd's root can be wrong. I.e.
> > its' always right as I see it. NFSd kthreads have to swap/use
> > relative path/whatever to communicate with proper mountd.
> > Or I'm missing something?
> 
> Ugh, I see the problem: I thought svc_export_request was called at the
> time mountd does the read, but instead its done at the time nfsd does
> the upcall.
> 
> I suspect that's wrong, and we really want this done in the context of
> the mountd process when it does the read call.  If d_path is called
> there then we have no problem.

Right, so I'd be happier if we could modify sunrpc_cache_pipe_upcall to
skip calling cache_request and instead delay that until cache_read().  I
think that should be possible.

--b.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Devel] [PATCH 2/6] nfsd: swap fs root in NFSd kthreads
  2012-12-11 15:35               ` J. Bruce Fields
@ 2012-12-12  7:45                 ` Stanislav Kinsbursky
  2013-01-11 14:56                 ` Stanislav Kinsbursky
  1 sibling, 0 replies; 24+ messages in thread
From: Stanislav Kinsbursky @ 2012-12-12  7:45 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, linux-kernel, devel

11.12.2012 19:35, J. Bruce Fields пишет:
> On Tue, Dec 11, 2012 at 10:20:36AM -0500, J. Bruce Fields wrote:
>> On Tue, Dec 11, 2012 at 07:07:00PM +0400, Stanislav Kinsbursky wrote:
>>> I don't really understand, how  mountd's root can be wrong. I.e.
>>> its' always right as I see it. NFSd kthreads have to swap/use
>>> relative path/whatever to communicate with proper mountd.
>>> Or I'm missing something?
>>
>> Ugh, I see the problem: I thought svc_export_request was called at the
>> time mountd does the read, but instead its done at the time nfsd does
>> the upcall.
>>
>> I suspect that's wrong, and we really want this done in the context of
>> the mountd process when it does the read call.  If d_path is called
>> there then we have no problem.
>
> Right, so I'd be happier if we could modify sunrpc_cache_pipe_upcall to
> skip calling cache_request and instead delay that until cache_read().  I
> think that should be possible.
>

Hmmm...
I'm not familiar with mountd to kernel exchange protocol.
It looks like kernel passes that path string to mountd on upcall.

> --b.
>


-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Devel] [PATCH 2/6] nfsd: swap fs root in NFSd kthreads
  2012-12-11 15:35               ` J. Bruce Fields
  2012-12-12  7:45                 ` Stanislav Kinsbursky
@ 2013-01-11 14:56                 ` Stanislav Kinsbursky
  2013-01-11 17:03                   ` J. Bruce Fields
  1 sibling, 1 reply; 24+ messages in thread
From: Stanislav Kinsbursky @ 2013-01-11 14:56 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, linux-kernel, devel

11.12.2012 19:35, J. Bruce Fields пишет:
> On Tue, Dec 11, 2012 at 10:20:36AM -0500, J. Bruce Fields wrote:
>> On Tue, Dec 11, 2012 at 07:07:00PM +0400, Stanislav Kinsbursky wrote:
>>> I don't really understand, how  mountd's root can be wrong. I.e.
>>> its' always right as I see it. NFSd kthreads have to swap/use
>>> relative path/whatever to communicate with proper mountd.
>>> Or I'm missing something?
>>
>> Ugh, I see the problem: I thought svc_export_request was called at the
>> time mountd does the read, but instead its done at the time nfsd does
>> the upcall.
>>
>> I suspect that's wrong, and we really want this done in the context of
>> the mountd process when it does the read call.  If d_path is called
>> there then we have no problem.
>
> Right, so I'd be happier if we could modify sunrpc_cache_pipe_upcall to
> skip calling cache_request and instead delay that until cache_read().  I
> think that should be possible.
>

So, Bruce, what we going to do (or what you want me to do) with the rest of NFSd changes?
I.e. how I should solve this d_path() problem?
I.e. I don't understand what did you mean by "I'd be happier if we could modify sunrpc_cache_pipe_upcall to
skip calling cache_request and instead delay that until cache_read()".
Could you give me a hint?

> --b.
>


-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Devel] [PATCH 2/6] nfsd: swap fs root in NFSd kthreads
  2013-01-11 14:56                 ` Stanislav Kinsbursky
@ 2013-01-11 17:03                   ` J. Bruce Fields
  2013-01-11 17:20                     ` J. Bruce Fields
  2013-01-14  6:08                     ` Stanislav Kinsbursky
  0 siblings, 2 replies; 24+ messages in thread
From: J. Bruce Fields @ 2013-01-11 17:03 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: linux-nfs, linux-kernel, devel

On Fri, Jan 11, 2013 at 06:56:58PM +0400, Stanislav Kinsbursky wrote:
> 11.12.2012 19:35, J. Bruce Fields пишет:
> >On Tue, Dec 11, 2012 at 10:20:36AM -0500, J. Bruce Fields wrote:
> >>On Tue, Dec 11, 2012 at 07:07:00PM +0400, Stanislav Kinsbursky wrote:
> >>>I don't really understand, how  mountd's root can be wrong. I.e.
> >>>its' always right as I see it. NFSd kthreads have to swap/use
> >>>relative path/whatever to communicate with proper mountd.
> >>>Or I'm missing something?
> >>
> >>Ugh, I see the problem: I thought svc_export_request was called at the
> >>time mountd does the read, but instead its done at the time nfsd does
> >>the upcall.
> >>
> >>I suspect that's wrong, and we really want this done in the context of
> >>the mountd process when it does the read call.  If d_path is called
> >>there then we have no problem.
> >
> >Right, so I'd be happier if we could modify sunrpc_cache_pipe_upcall to
> >skip calling cache_request and instead delay that until cache_read().  I
> >think that should be possible.
> >
> 
> So, Bruce, what we going to do (or what you want me to do) with the rest of NFSd changes?
> I.e. how I should solve this d_path() problem?
> I.e. I don't understand what did you mean by "I'd be happier if we could modify sunrpc_cache_pipe_upcall to
> skip calling cache_request and instead delay that until cache_read()".
> Could you give me a hint?

Definitely.  So normally the way these upcalls happen are:

	1. the kernel does a cache lookup, finds no matching item, and
	   calls sunrpc_cache_pipe_upcall().
	2. sunrpc_cache_pipe_upcall() formats the upcall: it allocates a
	   struct cache_request crq and fills crq->buf with the upcall
	   data by calling the cache's ->cache_request() method.
	3. Then rpc.mountd realizes there's data available in
	   /proc/net/rpc/nfsd.fh/content, so it does a read on that file.
	4. cache_read copies the formatted upcall from crq->buf to
	   to userspace.

So all I'm suggesting is that instead of calling ->cache_request() at
step 2, we do it at step 4.

Then cache_request will be called from rpc.mountd's read.  So we'll know
which container rpc.mountd is in.

Does that make sense?

--b.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Devel] [PATCH 2/6] nfsd: swap fs root in NFSd kthreads
  2013-01-11 17:03                   ` J. Bruce Fields
@ 2013-01-11 17:20                     ` J. Bruce Fields
  2013-01-14  6:17                       ` Stanislav Kinsbursky
  2013-01-14  6:08                     ` Stanislav Kinsbursky
  1 sibling, 1 reply; 24+ messages in thread
From: J. Bruce Fields @ 2013-01-11 17:20 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: linux-nfs, linux-kernel, devel

On Fri, Jan 11, 2013 at 12:03:12PM -0500, J. Bruce Fields wrote:
> On Fri, Jan 11, 2013 at 06:56:58PM +0400, Stanislav Kinsbursky wrote:
> > 11.12.2012 19:35, J. Bruce Fields пишет:
> > >On Tue, Dec 11, 2012 at 10:20:36AM -0500, J. Bruce Fields wrote:
> > >>On Tue, Dec 11, 2012 at 07:07:00PM +0400, Stanislav Kinsbursky wrote:
> > >>>I don't really understand, how  mountd's root can be wrong. I.e.
> > >>>its' always right as I see it. NFSd kthreads have to swap/use
> > >>>relative path/whatever to communicate with proper mountd.
> > >>>Or I'm missing something?
> > >>
> > >>Ugh, I see the problem: I thought svc_export_request was called at the
> > >>time mountd does the read, but instead its done at the time nfsd does
> > >>the upcall.
> > >>
> > >>I suspect that's wrong, and we really want this done in the context of
> > >>the mountd process when it does the read call.  If d_path is called
> > >>there then we have no problem.
> > >
> > >Right, so I'd be happier if we could modify sunrpc_cache_pipe_upcall to
> > >skip calling cache_request and instead delay that until cache_read().  I
> > >think that should be possible.
> > >
> > 
> > So, Bruce, what we going to do (or what you want me to do) with the rest of NFSd changes?
> > I.e. how I should solve this d_path() problem?
> > I.e. I don't understand what did you mean by "I'd be happier if we could modify sunrpc_cache_pipe_upcall to
> > skip calling cache_request and instead delay that until cache_read()".
> > Could you give me a hint?
> 
> Definitely.  So normally the way these upcalls happen are:
> 
> 	1. the kernel does a cache lookup, finds no matching item, and
> 	   calls sunrpc_cache_pipe_upcall().
> 	2. sunrpc_cache_pipe_upcall() formats the upcall: it allocates a
> 	   struct cache_request crq and fills crq->buf with the upcall
> 	   data by calling the cache's ->cache_request() method.
> 	3. Then rpc.mountd realizes there's data available in
> 	   /proc/net/rpc/nfsd.fh/content, so it does a read on that file.
> 	4. cache_read copies the formatted upcall from crq->buf to
> 	   to userspace.
> 
> So all I'm suggesting is that instead of calling ->cache_request() at
> step 2, we do it at step 4.
> 
> Then cache_request will be called from rpc.mountd's read.  So we'll know
> which container rpc.mountd is in.
> 
> Does that make sense?

The following is untested, ugly, and almost certainly insufficient and
wrong, but maybe it's a starting point:

--b.

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 9f84703..f15e4c1 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -744,6 +744,7 @@ struct cache_request {
 	char			* buf;
 	int			len;
 	int			readers;
+	void (*cache_request)(struct cache_detail *, struct cache_head *, char **, int *);
 };
 struct cache_reader {
 	struct cache_queue	q;
@@ -785,10 +786,19 @@ static ssize_t cache_read(struct file *filp, char __user *buf, size_t count,
 	spin_unlock(&queue_lock);
 
 	if (rp->offset == 0 && !test_bit(CACHE_PENDING, &rq->item->flags)) {
+		char *bp;
+		int len = PAGE_SIZE;
+
 		err = -EAGAIN;
 		spin_lock(&queue_lock);
 		list_move(&rp->q.list, &rq->q.list);
 		spin_unlock(&queue_lock);
+
+		bp = rq->buf;
+		rq->cache_request(cd, rq->item, &bp, &len);
+		if (rq->len < 0)
+			goto out;
+		rq->len = PAGE_SIZE - len;
 	} else {
 		if (rp->offset + count > rq->len)
 			count = rq->len - rp->offset;
@@ -1149,8 +1159,6 @@ int sunrpc_cache_pipe_upcall(struct cache_detail *detail, struct cache_head *h,
 
 	char *buf;
 	struct cache_request *crq;
-	char *bp;
-	int len;
 
 	if (!cache_listeners_exist(detail)) {
 		warn_no_listener(detail);
@@ -1167,19 +1175,10 @@ int sunrpc_cache_pipe_upcall(struct cache_detail *detail, struct cache_head *h,
 		return -EAGAIN;
 	}
 
-	bp = buf; len = PAGE_SIZE;
-
-	cache_request(detail, h, &bp, &len);
-
-	if (len < 0) {
-		kfree(buf);
-		kfree(crq);
-		return -EAGAIN;
-	}
+	crq->cache_request = cache_request;
 	crq->q.reader = 0;
 	crq->item = cache_get(h);
 	crq->buf = buf;
-	crq->len = PAGE_SIZE - len;
 	crq->readers = 0;
 	spin_lock(&queue_lock);
 	list_add_tail(&crq->q.list, &detail->queue);

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Devel] [PATCH 2/6] nfsd: swap fs root in NFSd kthreads
  2013-01-11 17:03                   ` J. Bruce Fields
  2013-01-11 17:20                     ` J. Bruce Fields
@ 2013-01-14  6:08                     ` Stanislav Kinsbursky
  1 sibling, 0 replies; 24+ messages in thread
From: Stanislav Kinsbursky @ 2013-01-14  6:08 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, linux-kernel, devel

11.01.2013 21:03, J. Bruce Fields пишет:
> On Fri, Jan 11, 2013 at 06:56:58PM +0400, Stanislav Kinsbursky wrote:
>> 11.12.2012 19:35, J. Bruce Fields пишет:
>>> On Tue, Dec 11, 2012 at 10:20:36AM -0500, J. Bruce Fields wrote:
>>>> On Tue, Dec 11, 2012 at 07:07:00PM +0400, Stanislav Kinsbursky wrote:
>>>>> I don't really understand, how  mountd's root can be wrong. I.e.
>>>>> its' always right as I see it. NFSd kthreads have to swap/use
>>>>> relative path/whatever to communicate with proper mountd.
>>>>> Or I'm missing something?
>>>>
>>>> Ugh, I see the problem: I thought svc_export_request was called at the
>>>> time mountd does the read, but instead its done at the time nfsd does
>>>> the upcall.
>>>>
>>>> I suspect that's wrong, and we really want this done in the context of
>>>> the mountd process when it does the read call.  If d_path is called
>>>> there then we have no problem.
>>>
>>> Right, so I'd be happier if we could modify sunrpc_cache_pipe_upcall to
>>> skip calling cache_request and instead delay that until cache_read().  I
>>> think that should be possible.
>>>
>>
>> So, Bruce, what we going to do (or what you want me to do) with the rest of NFSd changes?
>> I.e. how I should solve this d_path() problem?
>> I.e. I don't understand what did you mean by "I'd be happier if we could modify sunrpc_cache_pipe_upcall to
>> skip calling cache_request and instead delay that until cache_read()".
>> Could you give me a hint?
>
> Definitely.  So normally the way these upcalls happen are:
>
> 	1. the kernel does a cache lookup, finds no matching item, and
> 	   calls sunrpc_cache_pipe_upcall().
> 	2. sunrpc_cache_pipe_upcall() formats the upcall: it allocates a
> 	   struct cache_request crq and fills crq->buf with the upcall
> 	   data by calling the cache's ->cache_request() method.
> 	3. Then rpc.mountd realizes there's data available in
> 	   /proc/net/rpc/nfsd.fh/content, so it does a read on that file.
> 	4. cache_read copies the formatted upcall from crq->buf to
> 	   to userspace.
>
> So all I'm suggesting is that instead of calling ->cache_request() at
> step 2, we do it at step 4.
>
> Then cache_request will be called from rpc.mountd's read.  So we'll know
> which container rpc.mountd is in.
>
> Does that make sense?
>

Ok, now I understand.
If the upcall is just the way to inform rpc.mound, that there are some data to read and process, then it your proposing changes should work.
I'll prepare initial patch set.
Thanks!

> --b.
>


-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Devel] [PATCH 2/6] nfsd: swap fs root in NFSd kthreads
  2013-01-11 17:20                     ` J. Bruce Fields
@ 2013-01-14  6:17                       ` Stanislav Kinsbursky
  0 siblings, 0 replies; 24+ messages in thread
From: Stanislav Kinsbursky @ 2013-01-14  6:17 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, linux-kernel, devel

Thanks!

11.01.2013 21:20, J. Bruce Fields пишет:
> On Fri, Jan 11, 2013 at 12:03:12PM -0500, J. Bruce Fields wrote:
>> On Fri, Jan 11, 2013 at 06:56:58PM +0400, Stanislav Kinsbursky wrote:
>>> 11.12.2012 19:35, J. Bruce Fields пишет:
>>>> On Tue, Dec 11, 2012 at 10:20:36AM -0500, J. Bruce Fields wrote:
>>>>> On Tue, Dec 11, 2012 at 07:07:00PM +0400, Stanislav Kinsbursky wrote:
>>>>>> I don't really understand, how  mountd's root can be wrong. I.e.
>>>>>> its' always right as I see it. NFSd kthreads have to swap/use
>>>>>> relative path/whatever to communicate with proper mountd.
>>>>>> Or I'm missing something?
>>>>>
>>>>> Ugh, I see the problem: I thought svc_export_request was called at the
>>>>> time mountd does the read, but instead its done at the time nfsd does
>>>>> the upcall.
>>>>>
>>>>> I suspect that's wrong, and we really want this done in the context of
>>>>> the mountd process when it does the read call.  If d_path is called
>>>>> there then we have no problem.
>>>>
>>>> Right, so I'd be happier if we could modify sunrpc_cache_pipe_upcall to
>>>> skip calling cache_request and instead delay that until cache_read().  I
>>>> think that should be possible.
>>>>
>>>
>>> So, Bruce, what we going to do (or what you want me to do) with the rest of NFSd changes?
>>> I.e. how I should solve this d_path() problem?
>>> I.e. I don't understand what did you mean by "I'd be happier if we could modify sunrpc_cache_pipe_upcall to
>>> skip calling cache_request and instead delay that until cache_read()".
>>> Could you give me a hint?
>>
>> Definitely.  So normally the way these upcalls happen are:
>>
>> 	1. the kernel does a cache lookup, finds no matching item, and
>> 	   calls sunrpc_cache_pipe_upcall().
>> 	2. sunrpc_cache_pipe_upcall() formats the upcall: it allocates a
>> 	   struct cache_request crq and fills crq->buf with the upcall
>> 	   data by calling the cache's ->cache_request() method.
>> 	3. Then rpc.mountd realizes there's data available in
>> 	   /proc/net/rpc/nfsd.fh/content, so it does a read on that file.
>> 	4. cache_read copies the formatted upcall from crq->buf to
>> 	   to userspace.
>>
>> So all I'm suggesting is that instead of calling ->cache_request() at
>> step 2, we do it at step 4.
>>
>> Then cache_request will be called from rpc.mountd's read.  So we'll know
>> which container rpc.mountd is in.
>>
>> Does that make sense?
>
> The following is untested, ugly, and almost certainly insufficient and
> wrong, but maybe it's a starting point:
>
> --b.
>
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index 9f84703..f15e4c1 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -744,6 +744,7 @@ struct cache_request {
>   	char			* buf;
>   	int			len;
>   	int			readers;
> +	void (*cache_request)(struct cache_detail *, struct cache_head *, char **, int *);
>   };
>   struct cache_reader {
>   	struct cache_queue	q;
> @@ -785,10 +786,19 @@ static ssize_t cache_read(struct file *filp, char __user *buf, size_t count,
>   	spin_unlock(&queue_lock);
>
>   	if (rp->offset == 0 && !test_bit(CACHE_PENDING, &rq->item->flags)) {
> +		char *bp;
> +		int len = PAGE_SIZE;
> +
>   		err = -EAGAIN;
>   		spin_lock(&queue_lock);
>   		list_move(&rp->q.list, &rq->q.list);
>   		spin_unlock(&queue_lock);
> +
> +		bp = rq->buf;
> +		rq->cache_request(cd, rq->item, &bp, &len);
> +		if (rq->len < 0)
> +			goto out;
> +		rq->len = PAGE_SIZE - len;
>   	} else {
>   		if (rp->offset + count > rq->len)
>   			count = rq->len - rp->offset;
> @@ -1149,8 +1159,6 @@ int sunrpc_cache_pipe_upcall(struct cache_detail *detail, struct cache_head *h,
>
>   	char *buf;
>   	struct cache_request *crq;
> -	char *bp;
> -	int len;
>
>   	if (!cache_listeners_exist(detail)) {
>   		warn_no_listener(detail);
> @@ -1167,19 +1175,10 @@ int sunrpc_cache_pipe_upcall(struct cache_detail *detail, struct cache_head *h,
>   		return -EAGAIN;
>   	}
>
> -	bp = buf; len = PAGE_SIZE;
> -
> -	cache_request(detail, h, &bp, &len);
> -
> -	if (len < 0) {
> -		kfree(buf);
> -		kfree(crq);
> -		return -EAGAIN;
> -	}
> +	crq->cache_request = cache_request;
>   	crq->q.reader = 0;
>   	crq->item = cache_get(h);
>   	crq->buf = buf;
> -	crq->len = PAGE_SIZE - len;
>   	crq->readers = 0;
>   	spin_lock(&queue_lock);
>   	list_add_tail(&crq->q.list, &detail->queue);
>


-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2013-01-14  6:18 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-06 15:34 [PATCH 0/6] nfsd: make is works in a container Stanislav Kinsbursky
2012-12-06 15:34 ` [PATCH 1/6] nfsd: pass proper net to nfsd_destroy() from NFSd kthreads Stanislav Kinsbursky
2012-12-06 15:34 ` [PATCH 2/6] nfsd: swap fs root in " Stanislav Kinsbursky
2012-12-10 20:28   ` J. Bruce Fields
2012-12-11 14:00     ` Stanislav Kinsbursky
2012-12-11 14:12       ` [Devel] " Stanislav Kinsbursky
2012-12-11 14:51         ` Stanislav Kinsbursky
2012-12-11 14:56         ` J. Bruce Fields
2012-12-11 14:58           ` Al Viro
2012-12-11 15:07           ` Stanislav Kinsbursky
2012-12-11 15:20             ` J. Bruce Fields
2012-12-11 15:35               ` J. Bruce Fields
2012-12-12  7:45                 ` Stanislav Kinsbursky
2013-01-11 14:56                 ` Stanislav Kinsbursky
2013-01-11 17:03                   ` J. Bruce Fields
2013-01-11 17:20                     ` J. Bruce Fields
2013-01-14  6:17                       ` Stanislav Kinsbursky
2013-01-14  6:08                     ` Stanislav Kinsbursky
2012-12-11 14:54       ` Al Viro
2012-12-11 14:57         ` Stanislav Kinsbursky
2012-12-06 15:34 ` [PATCH 3/6] nfsd: make containerise NFSd filesystem Stanislav Kinsbursky
2012-12-06 15:34 ` [PATCH 4/6] nfsd: use proper net while reading "exports" file Stanislav Kinsbursky
2012-12-06 15:35 ` [PATCH 5/6] nfsd: disable usermode helper client tracker in container Stanislav Kinsbursky
2012-12-06 15:35 ` [PATCH 6/6] nfsd: enable NFSv4 state in containers Stanislav Kinsbursky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).