All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Super simple flex file server
@ 2016-05-25  5:09 Tom Haynes
  2016-05-25  5:09 ` [PATCH 1/4] nfsd: flex file device id encoding will need the server addres Tom Haynes
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Tom Haynes @ 2016-05-25  5:09 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Linux NFS Mailing list, Christoph Hellwig

Hi Bruce,

The following patches make a flex file server in which
the mds and the ds are the same machine. And the same inode
services both metadata and data.

My biggest concern is the selection of layout type
in nfsd4_setup_layout_type(). If CONFIG_NFSD_BLOCKLAYOUT,
CONFIG_NFSD_SCSILAYOUT, and CONFIG_NFSD_FLEXFILELAYOUT
are all selected, then the flex file layout type will win. :-)

These patches are also in my flex_server branch at
git://git.linux-nfs.org/projects/loghyr/linux-nfs.git

Thanks,
Tom

Tom Haynes (4):
  nfsd: flex file device id encoding will need the server addres
  nfsd: Can leak pnfs_block_extent on error
  nfsd: Add a super simple flex file server
  nfsd: Provide a config option for flex file layouts

 fs/nfsd/Kconfig             |  13 ++++
 fs/nfsd/Makefile            |   1 +
 fs/nfsd/blocklayout.c       |   6 +-
 fs/nfsd/flexfilelayout.c    | 148 ++++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/flexfilelayoutxdr.c | 116 ++++++++++++++++++++++++++++++++++
 fs/nfsd/flexfilelayoutxdr.h |  50 +++++++++++++++
 fs/nfsd/nfs4layouts.c       |  10 +++
 fs/nfsd/nfs4proc.c          |   1 +
 fs/nfsd/pnfs.h              |   4 ++
 9 files changed, 348 insertions(+), 1 deletion(-)
 create mode 100644 fs/nfsd/flexfilelayout.c
 create mode 100644 fs/nfsd/flexfilelayoutxdr.c
 create mode 100644 fs/nfsd/flexfilelayoutxdr.h

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] nfsd: flex file device id encoding will need the server addres
  2016-05-25  5:09 [PATCH 0/4] Super simple flex file server Tom Haynes
@ 2016-05-25  5:09 ` Tom Haynes
  2016-05-25 11:49   ` Jeff Layton
  2016-05-25 15:08   ` Christoph Hellwig
  2016-05-25  5:09 ` [PATCH 2/4] nfsd: Can leak pnfs_block_extent on error Tom Haynes
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 22+ messages in thread
From: Tom Haynes @ 2016-05-25  5:09 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Linux NFS Mailing list, Christoph Hellwig

Signed-off-by: Tom Haynes <loghyr@primarydata.com>
---
 fs/nfsd/blocklayout.c | 2 ++
 fs/nfsd/nfs4proc.c    | 1 +
 fs/nfsd/pnfs.h        | 1 +
 3 files changed, 4 insertions(+)

diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c
index e55b524..248adb6 100644
--- a/fs/nfsd/blocklayout.c
+++ b/fs/nfsd/blocklayout.c
@@ -162,6 +162,7 @@ nfsd4_block_get_device_info_simple(struct super_block *sb,
 
 static __be32
 nfsd4_block_proc_getdeviceinfo(struct super_block *sb,
+		struct svc_rqst *rqstp,
 		struct nfs4_client *clp,
 		struct nfsd4_getdeviceinfo *gdp)
 {
@@ -354,6 +355,7 @@ nfsd4_block_get_device_info_scsi(struct super_block *sb,
 
 static __be32
 nfsd4_scsi_proc_getdeviceinfo(struct super_block *sb,
+		struct svc_rqst *rqstp,
 		struct nfs4_client *clp,
 		struct nfsd4_getdeviceinfo *gdp)
 {
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index de1ff1d..b28e45b 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1270,6 +1270,7 @@ nfsd4_getdeviceinfo(struct svc_rqst *rqstp,
 	nfserr = nfs_ok;
 	if (gdp->gd_maxcount != 0) {
 		nfserr = ops->proc_getdeviceinfo(exp->ex_path.mnt->mnt_sb,
+					rqstp,
 					cstate->session->se_client, gdp);
 	}
 
diff --git a/fs/nfsd/pnfs.h b/fs/nfsd/pnfs.h
index 7d073b9..e855677 100644
--- a/fs/nfsd/pnfs.h
+++ b/fs/nfsd/pnfs.h
@@ -21,6 +21,7 @@ struct nfsd4_layout_ops {
 	u32		notify_types;
 
 	__be32 (*proc_getdeviceinfo)(struct super_block *sb,
+			struct svc_rqst *rqstp,
 			struct nfs4_client *clp,
 			struct nfsd4_getdeviceinfo *gdevp);
 	__be32 (*encode_getdeviceinfo)(struct xdr_stream *xdr,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/4] nfsd: Can leak pnfs_block_extent on error
  2016-05-25  5:09 [PATCH 0/4] Super simple flex file server Tom Haynes
  2016-05-25  5:09 ` [PATCH 1/4] nfsd: flex file device id encoding will need the server addres Tom Haynes
@ 2016-05-25  5:09 ` Tom Haynes
  2016-05-25 11:50   ` Jeff Layton
  2016-05-25 15:07   ` Christoph Hellwig
  2016-05-25  5:09 ` [PATCH 3/4] nfsd: Add a super simple flex file server Tom Haynes
  2016-05-25  5:09 ` [PATCH 4/4] nfsd: Provide a config option for flex file layouts Tom Haynes
  3 siblings, 2 replies; 22+ messages in thread
From: Tom Haynes @ 2016-05-25  5:09 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Linux NFS Mailing list, Christoph Hellwig

Signed-off-by: Tom Haynes <loghyr@primarydata.com>
---
 fs/nfsd/blocklayout.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c
index 248adb6..9a195f1 100644
--- a/fs/nfsd/blocklayout.c
+++ b/fs/nfsd/blocklayout.c
@@ -23,7 +23,7 @@ nfsd4_block_proc_layoutget(struct inode *inode, const struct svc_fh *fhp,
 	struct nfsd4_layout_seg *seg = &args->lg_seg;
 	struct super_block *sb = inode->i_sb;
 	u32 block_size = (1 << inode->i_blkbits);
-	struct pnfs_block_extent *bex;
+	struct pnfs_block_extent *bex = NULL;
 	struct iomap iomap;
 	u32 device_generation = 0;
 	int error;
@@ -105,9 +105,11 @@ nfsd4_block_proc_layoutget(struct inode *inode, const struct svc_fh *fhp,
 	return 0;
 
 out_error:
+	kfree(bex);
 	seg->length = 0;
 	return nfserrno(error);
 out_layoutunavailable:
+	kfree(bex);
 	seg->length = 0;
 	return nfserr_layoutunavailable;
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/4] nfsd: Add a super simple flex file server
  2016-05-25  5:09 [PATCH 0/4] Super simple flex file server Tom Haynes
  2016-05-25  5:09 ` [PATCH 1/4] nfsd: flex file device id encoding will need the server addres Tom Haynes
  2016-05-25  5:09 ` [PATCH 2/4] nfsd: Can leak pnfs_block_extent on error Tom Haynes
@ 2016-05-25  5:09 ` Tom Haynes
  2016-05-25 12:00   ` Jeff Layton
                     ` (2 more replies)
  2016-05-25  5:09 ` [PATCH 4/4] nfsd: Provide a config option for flex file layouts Tom Haynes
  3 siblings, 3 replies; 22+ messages in thread
From: Tom Haynes @ 2016-05-25  5:09 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Linux NFS Mailing list, Christoph Hellwig

Have a simple flex file server where the mds (NFSv4.1 or NFSv4.2)
is also the ds (NFSv3). I.e., the metadata and the data file are
the exact same file.

This will allow testing of the flex file client.

Simply add the "pnfs" export option to your export
in /etc/exports and mount from a client that supports
flex files.

Signed-off-by: Tom Haynes <loghyr@primarydata.com>
---
 fs/nfsd/Makefile            |   1 +
 fs/nfsd/flexfilelayout.c    | 148 ++++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/flexfilelayoutxdr.c | 116 ++++++++++++++++++++++++++++++++++
 fs/nfsd/flexfilelayoutxdr.h |  50 +++++++++++++++
 fs/nfsd/nfs4layouts.c       |  10 +++
 fs/nfsd/pnfs.h              |   3 +
 6 files changed, 328 insertions(+)
 create mode 100644 fs/nfsd/flexfilelayout.c
 create mode 100644 fs/nfsd/flexfilelayoutxdr.c
 create mode 100644 fs/nfsd/flexfilelayoutxdr.h

diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index 3ae5f3c..5f5d3a7 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -20,3 +20,4 @@ nfsd-$(CONFIG_NFSD_V4)	+= nfs4proc.o nfs4xdr.o nfs4state.o nfs4idmap.o \
 nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
 nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
 nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
+nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
diff --git a/fs/nfsd/flexfilelayout.c b/fs/nfsd/flexfilelayout.c
new file mode 100644
index 0000000..d28b8a0
--- /dev/null
+++ b/fs/nfsd/flexfilelayout.c
@@ -0,0 +1,148 @@
+/*
+ * Copyright (c) 2016 Tom Haynes <loghyr@primarydata.com>
+ *
+ * The following implements a super-simple flex-file server
+ * where the NFSv4.1 mds is also the ds. And the storage is
+ * the same. I.e., writing to the mds via a NFSv4.1 WRITE
+ * goes to the same location as the NFSv3 WRITE.
+ */
+#include <linux/exportfs.h>
+#include <linux/genhd.h>
+#include <linux/slab.h>
+#include <linux/pr.h>
+
+#include <linux/nfsd/debug.h>
+
+#include <linux/sunrpc/addr.h>
+
+#include "flexfilelayoutxdr.h"
+#include "pnfs.h"
+
+#define NFSDDBG_FACILITY	NFSDDBG_PNFS
+
+static __be32
+nfsd4_ff_proc_layoutget(struct inode *inode, const struct svc_fh *fhp,
+		struct nfsd4_layoutget *args)
+{
+	struct nfsd4_layout_seg *seg = &args->lg_seg;
+	u32 block_size = (1 << inode->i_blkbits);
+	u32 device_generation = 0;
+	int error;
+
+	struct pnfs_ff_layout *fl;
+
+	if (seg->offset & (block_size - 1)) {
+		dprintk("pnfsd: I/O misaligned\n");
+		goto out_layoutunavailable;
+	}
+
+	/*
+	 * The super simple flex file server has 1 mirror, 1 data server,
+	 * and 1 file handle. So instead of 4 allocs, do 1 for now.
+	 * Zero it out for the stateid - don't want junk in there!
+	 */
+	error = -ENOMEM;
+	fl = kzalloc(sizeof(*fl), GFP_KERNEL);
+	if (!fl)
+		goto out_error;
+	args->lg_content = fl;
+
+	/*
+	 * Avoid layout commit, try to force the I/O to the DS,
+	 * and for fun, cause all IOMODE_RW layout segments to
+	 * effectively be WRITE only.
+	 */
+	fl->flags = FF_FLAGS_NO_LAYOUTCOMMIT | FF_FLAGS_NO_IO_THRU_MDS |
+		    FF_FLAGS_NO_READ_IO;
+
+	fl->uid = inode->i_uid;
+	fl->gid = inode->i_gid;
+
+	error = nfsd4_set_deviceid(&fl->deviceid, fhp, device_generation);
+	if (error)
+		goto out_error;
+
+	fl->fh.size = fhp->fh_handle.fh_size;
+	memcpy(fl->fh.data, &fhp->fh_handle.fh_base, fl->fh.size);
+
+	/* Give whole file layout segments */
+	seg->offset = 0;
+	seg->length = NFS4_MAX_UINT64;
+
+	dprintk("GET: 0x%llx:0x%llx %d\n", seg->offset, seg->length,
+		seg->iomode);
+	return 0;
+
+out_error:
+	kfree(fl);
+	seg->length = 0;
+	return nfserrno(error);
+out_layoutunavailable:
+	seg->length = 0;
+	return nfserr_layoutunavailable;
+}
+
+#ifdef CONFIG_NFSD_FLEXFILELAYOUT
+static __be32
+nfsd4_ff_proc_getdeviceinfo(struct super_block *sb,
+		struct svc_rqst *rqstp,
+		struct nfs4_client *clp,
+		struct nfsd4_getdeviceinfo *gdp)
+{
+	struct pnfs_ff_device_addr *da;
+
+	u16 port;
+	char addr[INET6_ADDRSTRLEN];
+
+	if (sb->s_bdev != sb->s_bdev->bd_contains)
+		return nfserr_inval;
+
+	da = kzalloc(sizeof(struct pnfs_ff_device_addr), GFP_KERNEL);
+	if (!da)
+		return nfserrno(-ENOMEM);
+
+	gdp->gd_device = da;
+
+	da->version = 3;
+	da->minor_version = 0;
+
+	/* FIXME: Get from export? */
+	da->rsize = 4096;
+	da->wsize = 4096;
+
+	rpc_ntop((struct sockaddr *)&rqstp->rq_daddr,
+		 addr, INET6_ADDRSTRLEN);
+	if (rqstp->rq_daddr.ss_family == AF_INET) {
+		struct sockaddr_in *sin;
+
+		sin = (struct sockaddr_in *)&rqstp->rq_daddr;
+		port = ntohs(sin->sin_port);
+		snprintf(da->netaddr.netid, FF_NETID_LEN + 1, "tcp");
+		da->netaddr.netid_len = 3;
+	} else {
+		struct sockaddr_in6 *sin6;
+
+		sin6 = (struct sockaddr_in6 *)&rqstp->rq_daddr;
+		port = ntohs(sin6->sin6_port);
+		snprintf(da->netaddr.netid, FF_NETID_LEN + 1, "tcp6");
+		da->netaddr.netid_len = 4;
+	}
+
+	da->netaddr.addr_len =
+		snprintf(da->netaddr.addr, FF_ADDR_LEN + 1,
+			 "%s.%hhu.%hhu", addr, port >> 8, port & 0xff);
+
+	da->tightly_coupled = false;
+
+	return 0;
+}
+
+const struct nfsd4_layout_ops ff_layout_ops = {
+	.notify_types		=
+			NOTIFY_DEVICEID4_DELETE | NOTIFY_DEVICEID4_CHANGE,
+	.proc_getdeviceinfo	= nfsd4_ff_proc_getdeviceinfo,
+	.encode_getdeviceinfo	= nfsd4_ff_encode_getdeviceinfo,
+	.proc_layoutget		= nfsd4_ff_proc_layoutget,
+	.encode_layoutget	= nfsd4_ff_encode_layoutget,
+};
+#endif /* CONFIG_NFSD_FLEXFILELAYOUT */
diff --git a/fs/nfsd/flexfilelayoutxdr.c b/fs/nfsd/flexfilelayoutxdr.c
new file mode 100644
index 0000000..9d15ee0
--- /dev/null
+++ b/fs/nfsd/flexfilelayoutxdr.c
@@ -0,0 +1,116 @@
+/*
+ * Copyright (c) 2016 Tom Haynes <loghyr@primarydata.com>
+ */
+#include <linux/sunrpc/svc.h>
+#include <linux/exportfs.h>
+#include <linux/nfs4.h>
+
+#include "nfsd.h"
+#include "flexfilelayoutxdr.h"
+
+#define NFSDDBG_FACILITY	NFSDDBG_PNFS
+
+struct ff_idmap {
+	char buf[11];
+	int len;
+};
+
+__be32
+nfsd4_ff_encode_layoutget(struct xdr_stream *xdr,
+		struct nfsd4_layoutget *lgp)
+{
+	struct pnfs_ff_layout *fl = lgp->lg_content;
+	int len, mirror_len, ds_len, fh_len;
+	__be32 *p;
+
+	/*
+	 * Unlike nfsd4_encode_user, we know these will
+	 * always be stringified.
+	 */
+	struct ff_idmap uid;
+	struct ff_idmap gid;
+
+	fh_len = 4 + fl->fh.size;
+
+	uid.len = sprintf(uid.buf, "%u", from_kuid(&init_user_ns, fl->uid));
+	gid.len = sprintf(gid.buf, "%u", from_kgid(&init_user_ns, fl->gid));
+
+	/* 8 + len for recording the length, name, and padding */
+	ds_len = 20 + sizeof(stateid_opaque_t) + 4 + fh_len +
+		 8 + uid.len + 8 + gid.len;
+
+	mirror_len = 4 + ds_len;
+
+	/* The layout segment */
+	len = 20 + mirror_len;
+
+	p = xdr_reserve_space(xdr, sizeof(__be32) + len);
+	if (!p)
+		return nfserr_toosmall;
+
+	*p++ = cpu_to_be32(len);
+	p = xdr_encode_hyper(p, 1);		/* stripe unit of 1 */
+
+	*p++ = cpu_to_be32(1);			/* single mirror */
+	*p++ = cpu_to_be32(1);			/* single data server */
+
+	p = xdr_encode_opaque_fixed(p, &fl->deviceid,
+			sizeof(struct nfsd4_deviceid));
+
+	*p++ = cpu_to_be32(1);			/* efficiency */
+
+	*p++ = cpu_to_be32(fl->stateid.si_generation);
+	p = xdr_encode_opaque_fixed(p, &fl->stateid.si_opaque,
+				    sizeof(stateid_opaque_t));
+
+	*p++ = cpu_to_be32(1);			/* single file handle */
+	p = xdr_encode_opaque(p, fl->fh.data, fl->fh.size);
+
+	p = xdr_encode_opaque(p, uid.buf, uid.len);
+	p = xdr_encode_opaque(p, gid.buf, gid.len);
+
+	*p++ = cpu_to_be32(fl->flags);
+	*p++ = cpu_to_be32(0);			/* No stats collect hint */
+
+	return 0;
+}
+
+__be32
+nfsd4_ff_encode_getdeviceinfo(struct xdr_stream *xdr,
+		struct nfsd4_getdeviceinfo *gdp)
+{
+	struct pnfs_ff_device_addr *da = gdp->gd_device;
+	int len;
+	int ver_len;
+	int addr_len;
+	__be32 *p;
+
+	/* len + padding for two strings */
+	addr_len = 16 + da->netaddr.netid_len + da->netaddr.addr_len;
+	ver_len = 20;
+
+	len = 4 + ver_len + 4 + addr_len;
+
+	p = xdr_reserve_space(xdr, len + sizeof(__be32));
+	if (!p)
+		return nfserr_resource;
+
+	/*
+	 * Fill in the overall length and number of volumes at the beginning
+	 * of the layout.
+	 */
+	*p++ = cpu_to_be32(len);
+	*p++ = cpu_to_be32(1);			/* 1 netaddr */
+	p = xdr_encode_opaque(p, da->netaddr.netid, da->netaddr.netid_len);
+	p = xdr_encode_opaque(p, da->netaddr.addr, da->netaddr.addr_len);
+
+	*p++ = cpu_to_be32(1);			/* 1 versions */
+
+	*p++ = cpu_to_be32(da->version);
+	*p++ = cpu_to_be32(da->minor_version);
+	*p++ = cpu_to_be32(da->rsize);
+	*p++ = cpu_to_be32(da->wsize);
+	*p++ = cpu_to_be32(da->tightly_coupled);
+
+	return 0;
+}
diff --git a/fs/nfsd/flexfilelayoutxdr.h b/fs/nfsd/flexfilelayoutxdr.h
new file mode 100644
index 0000000..40e6d1b
--- /dev/null
+++ b/fs/nfsd/flexfilelayoutxdr.h
@@ -0,0 +1,50 @@
+/*
+ * Copyright (c) 2016 Tom Haynes <loghyr@primarydata.com>
+ */
+#ifndef _NFSD_FLEXFILELAYOUTXDR_H
+#define _NFSD_FLEXFILELAYOUTXDR_H 1
+
+#include <linux/inet.h>
+#include "xdr4.h"
+
+#define FF_FLAGS_NO_LAYOUTCOMMIT 1
+#define FF_FLAGS_NO_IO_THRU_MDS  2
+#define FF_FLAGS_NO_READ_IO      4
+
+struct iomap;
+struct xdr_stream;
+
+#define FF_NETID_LEN		(4)
+#define FF_ADDR_LEN		(INET6_ADDRSTRLEN + 1)
+struct pnfs_ff_netaddr {
+	char				netid[FF_NETID_LEN + 1];
+	char				addr[FF_ADDR_LEN + 1];
+	u32				netid_len;
+	u32				addr_len;
+};
+
+struct pnfs_ff_device_addr {
+	struct pnfs_ff_netaddr		netaddr;
+	u32				version;
+	u32				minor_version;
+	u32				rsize;
+	u32				wsize;
+	bool				tightly_coupled;
+};
+
+struct pnfs_ff_layout {
+	u32				flags;
+	u32				stats_collect_hint;
+	kuid_t				uid;
+	kgid_t				gid;
+	struct nfsd4_deviceid		deviceid;
+	stateid_t			stateid;
+	struct nfs_fh			fh;
+};
+
+__be32 nfsd4_ff_encode_getdeviceinfo(struct xdr_stream *xdr,
+		struct nfsd4_getdeviceinfo *gdp);
+__be32 nfsd4_ff_encode_layoutget(struct xdr_stream *xdr,
+		struct nfsd4_layoutget *lgp);
+
+#endif /* _NFSD_FLEXFILELAYOUTXDR_H */
diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 825c7bc..7cbd56a 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -27,6 +27,9 @@ static const struct nfsd4_callback_ops nfsd4_cb_layout_ops;
 static const struct lock_manager_operations nfsd4_layouts_lm_ops;
 
 const struct nfsd4_layout_ops *nfsd4_layout_ops[LAYOUT_TYPE_MAX] =  {
+#ifdef CONFIG_NFSD_FLEXFILELAYOUT
+	[LAYOUT_FLEX_FILES]	= &ff_layout_ops,
+#endif
 #ifdef CONFIG_NFSD_BLOCKLAYOUT
 	[LAYOUT_BLOCK_VOLUME]	= &bl_layout_ops,
 #endif
@@ -122,7 +125,9 @@ nfsd4_set_deviceid(struct nfsd4_deviceid *id, const struct svc_fh *fhp,
 
 void nfsd4_setup_layout_type(struct svc_export *exp)
 {
+#if defined(CONFIG_NFSD_BLOCKLAYOUT) || defined(CONFIG_NFSD_SCSILAYOUT)
 	struct super_block *sb = exp->ex_path.mnt->mnt_sb;
+#endif
 
 	if (!(exp->ex_flags & NFSEXP_PNFS))
 		return;
@@ -145,6 +150,11 @@ void nfsd4_setup_layout_type(struct svc_export *exp)
 	    sb->s_bdev && sb->s_bdev->bd_disk->fops->pr_ops)
 		exp->ex_layout_type = LAYOUT_SCSI;
 #endif
+#ifdef CONFIG_NFSD_FLEXFILELAYOUT
+	// FIXME: How do we "export" this and how does it mingle with
+	// the above types?
+	exp->ex_layout_type = LAYOUT_FLEX_FILES;
+#endif
 }
 
 static void
diff --git a/fs/nfsd/pnfs.h b/fs/nfsd/pnfs.h
index e855677..0c2a716 100644
--- a/fs/nfsd/pnfs.h
+++ b/fs/nfsd/pnfs.h
@@ -45,6 +45,9 @@ extern const struct nfsd4_layout_ops bl_layout_ops;
 #ifdef CONFIG_NFSD_SCSILAYOUT
 extern const struct nfsd4_layout_ops scsi_layout_ops;
 #endif
+#ifdef CONFIG_NFSD_FLEXFILELAYOUT
+extern const struct nfsd4_layout_ops ff_layout_ops;
+#endif
 
 __be32 nfsd4_preprocess_layout_stateid(struct svc_rqst *rqstp,
 		struct nfsd4_compound_state *cstate, stateid_t *stateid,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/4] nfsd: Provide a config option for flex file layouts
  2016-05-25  5:09 [PATCH 0/4] Super simple flex file server Tom Haynes
                   ` (2 preceding siblings ...)
  2016-05-25  5:09 ` [PATCH 3/4] nfsd: Add a super simple flex file server Tom Haynes
@ 2016-05-25  5:09 ` Tom Haynes
  2016-05-25 15:09   ` Christoph Hellwig
  3 siblings, 1 reply; 22+ messages in thread
From: Tom Haynes @ 2016-05-25  5:09 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Linux NFS Mailing list, Christoph Hellwig

Signed-off-by: Tom Haynes <loghyr@primarydata.com>
---
 fs/nfsd/Kconfig | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index c9f583d..76edffb 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -111,6 +111,19 @@ config NFSD_SCSILAYOUT
 
 	  If unsure, say N.
 
+config NFSD_FLEXFILELAYOUT
+	bool "NFSv4.1 server support for pNFS Flex File layouts"
+	depends on NFSD_V4
+	select NFSD_PNFS
+	help
+	  This option enables support for the exporting pNFS Flex File
+	  layouts in the kernel's NFS server. The pNFS Flex File  layout
+	  enables NFS clients to directly perform I/O to NFSv3 devices
+	  accesible to both the server and the clients.  See
+	  draft-ietf-nfsv4-flex-files for more details.
+
+	  If unsure, say N.
+
 config NFSD_V4_SECURITY_LABEL
 	bool "Provide Security Label support for NFSv4 server"
 	depends on NFSD_V4 && SECURITY
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] nfsd: flex file device id encoding will need the server addres
  2016-05-25  5:09 ` [PATCH 1/4] nfsd: flex file device id encoding will need the server addres Tom Haynes
@ 2016-05-25 11:49   ` Jeff Layton
  2016-05-25 15:08   ` Christoph Hellwig
  1 sibling, 0 replies; 22+ messages in thread
From: Jeff Layton @ 2016-05-25 11:49 UTC (permalink / raw)
  To: Tom Haynes, J. Bruce Fields; +Cc: Linux NFS Mailing list, Christoph Hellwig

On Tue, 2016-05-24 at 22:09 -0700, Tom Haynes wrote:
> Signed-off-by: Tom Haynes <loghyr@primarydata.com>
> ---
>  fs/nfsd/blocklayout.c | 2 ++
>  fs/nfsd/nfs4proc.c    | 1 +
>  fs/nfsd/pnfs.h        | 1 +
>  3 files changed, 4 insertions(+)
> 
> diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c
> index e55b524..248adb6 100644
> --- a/fs/nfsd/blocklayout.c
> +++ b/fs/nfsd/blocklayout.c
> @@ -162,6 +162,7 @@ nfsd4_block_get_device_info_simple(struct super_block *sb,
>  
>  static __be32
>  nfsd4_block_proc_getdeviceinfo(struct super_block *sb,
> +		struct svc_rqst *rqstp,
>  		struct nfs4_client *clp,
>  		struct nfsd4_getdeviceinfo *gdp)
>  {
> @@ -354,6 +355,7 @@ nfsd4_block_get_device_info_scsi(struct super_block *sb,
>  
>  static __be32
>  nfsd4_scsi_proc_getdeviceinfo(struct super_block *sb,
> +		struct svc_rqst *rqstp,
>  		struct nfs4_client *clp,
>  		struct nfsd4_getdeviceinfo *gdp)
>  {
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index de1ff1d..b28e45b 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1270,6 +1270,7 @@ nfsd4_getdeviceinfo(struct svc_rqst *rqstp,
>  	nfserr = nfs_ok;
>  	if (gdp->gd_maxcount != 0) {
>  		nfserr = ops->proc_getdeviceinfo(exp->ex_path.mnt->mnt_sb,
> +					rqstp,
>  					cstate->session->se_client, gdp);
>  	}
>  
> diff --git a/fs/nfsd/pnfs.h b/fs/nfsd/pnfs.h
> index 7d073b9..e855677 100644
> --- a/fs/nfsd/pnfs.h
> +++ b/fs/nfsd/pnfs.h
> @@ -21,6 +21,7 @@ struct nfsd4_layout_ops {
>  	u32		notify_types;
>  
>  	__be32 (*proc_getdeviceinfo)(struct super_block *sb,
> +			struct svc_rqst *rqstp,
>  			struct nfs4_client *clp,
>  			struct nfsd4_getdeviceinfo *gdevp);
>  	__be32 (*encode_getdeviceinfo)(struct xdr_stream *xdr,

Looks fine.

Reviewed-by: Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] nfsd: Can leak pnfs_block_extent on error
  2016-05-25  5:09 ` [PATCH 2/4] nfsd: Can leak pnfs_block_extent on error Tom Haynes
@ 2016-05-25 11:50   ` Jeff Layton
  2016-05-25 15:07   ` Christoph Hellwig
  1 sibling, 0 replies; 22+ messages in thread
From: Jeff Layton @ 2016-05-25 11:50 UTC (permalink / raw)
  To: Tom Haynes, J. Bruce Fields; +Cc: Linux NFS Mailing list, Christoph Hellwig

On Tue, 2016-05-24 at 22:09 -0700, Tom Haynes wrote:
> Signed-off-by: Tom Haynes <loghyr@primarydata.com>
> ---
>  fs/nfsd/blocklayout.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c
> index 248adb6..9a195f1 100644
> --- a/fs/nfsd/blocklayout.c
> +++ b/fs/nfsd/blocklayout.c
> @@ -23,7 +23,7 @@ nfsd4_block_proc_layoutget(struct inode *inode, const struct svc_fh *fhp,
>  	struct nfsd4_layout_seg *seg = &args->lg_seg;
>  	struct super_block *sb = inode->i_sb;
>  	u32 block_size = (1 << inode->i_blkbits);
> -	struct pnfs_block_extent *bex;
> +	struct pnfs_block_extent *bex = NULL;
>  	struct iomap iomap;
>  	u32 device_generation = 0;
>  	int error;
> @@ -105,9 +105,11 @@ nfsd4_block_proc_layoutget(struct inode *inode, const struct svc_fh *fhp,
>  	return 0;
>  
>  out_error:
> +	kfree(bex);
>  	seg->length = 0;
>  	return nfserrno(error);
>  out_layoutunavailable:
> +	kfree(bex);
>  	seg->length = 0;
>  	return nfserr_layoutunavailable;
>  }

Nice catch! Might be reasonable for stable?

Reviewed-by: Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] nfsd: Add a super simple flex file server
  2016-05-25  5:09 ` [PATCH 3/4] nfsd: Add a super simple flex file server Tom Haynes
@ 2016-05-25 12:00   ` Jeff Layton
  2016-05-25 12:30   ` Jeff Layton
  2016-05-25 15:15   ` Christoph Hellwig
  2 siblings, 0 replies; 22+ messages in thread
From: Jeff Layton @ 2016-05-25 12:00 UTC (permalink / raw)
  To: Tom Haynes, J. Bruce Fields; +Cc: Linux NFS Mailing list, Christoph Hellwig

On Tue, 2016-05-24 at 22:09 -0700, Tom Haynes wrote:
> Have a simple flex file server where the mds (NFSv4.1 or NFSv4.2)
> is also the ds (NFSv3). I.e., the metadata and the data file are
> the exact same file.
> 
> This will allow testing of the flex file client.
> 
> Simply add the "pnfs" export option to your export
> in /etc/exports and mount from a client that supports
> flex files.
> 
> Signed-off-by: Tom Haynes <loghyr@primarydata.com>
> ---
>  fs/nfsd/Makefile            |   1 +
>  fs/nfsd/flexfilelayout.c    | 148 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/nfsd/flexfilelayoutxdr.c | 116 ++++++++++++++++++++++++++++++++++
>  fs/nfsd/flexfilelayoutxdr.h |  50 +++++++++++++++
>  fs/nfsd/nfs4layouts.c       |  10 +++
>  fs/nfsd/pnfs.h              |   3 +
>  6 files changed, 328 insertions(+)
>  create mode 100644 fs/nfsd/flexfilelayout.c
>  create mode 100644 fs/nfsd/flexfilelayoutxdr.c
>  create mode 100644 fs/nfsd/flexfilelayoutxdr.h
> 
> diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
> index 3ae5f3c..5f5d3a7 100644
> --- a/fs/nfsd/Makefile
> +++ b/fs/nfsd/Makefile
> @@ -20,3 +20,4 @@ nfsd-$(CONFIG_NFSD_V4)	+= nfs4proc.o nfs4xdr.o nfs4state.o nfs4idmap.o \
>  nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
>  nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
>  nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
> +nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
> diff --git a/fs/nfsd/flexfilelayout.c b/fs/nfsd/flexfilelayout.c
> new file mode 100644
> index 0000000..d28b8a0
> --- /dev/null
> +++ b/fs/nfsd/flexfilelayout.c
> @@ -0,0 +1,148 @@
> +/*
> + * Copyright (c) 2016 Tom Haynes <loghyr@primarydata.com>
> + *
> + * The following implements a super-simple flex-file server
> + * where the NFSv4.1 mds is also the ds. And the storage is
> + * the same. I.e., writing to the mds via a NFSv4.1 WRITE
> + * goes to the same location as the NFSv3 WRITE.
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#include 
> +
> +#include "flexfilelayoutxdr.h"
> +#include "pnfs.h"
> +
> +#define NFSDDBG_FACILITY	NFSDDBG_PNFS
> +
> +static __be32
> +nfsd4_ff_proc_layoutget(struct inode *inode, const struct svc_fh *fhp,
> +		struct nfsd4_layoutget *args)
> +{
> +	struct nfsd4_layout_seg *seg = &args->lg_seg;
> +	u32 block_size = (1 << inode->i_blkbits);
> +	u32 device_generation = 0;
> +	int error;
> +
> +	struct pnfs_ff_layout *fl;
> +
> +	if (seg->offset & (block_size - 1)) {
> +		dprintk("pnfsd: I/O misaligned\n");
> +		goto out_layoutunavailable;
> +	}
> +
> +	/*
> +	 * The super simple flex file server has 1 mirror, 1 data server,
> +	 * and 1 file handle. So instead of 4 allocs, do 1 for now.
> +	 * Zero it out for the stateid - don't want junk in there!
> +	 */
> +	error = -ENOMEM;
> +	fl = kzalloc(sizeof(*fl), GFP_KERNEL);
> +	if (!fl)
> +		goto out_error;
> +	args->lg_content = fl;
> +
> +	/*
> +	 * Avoid layout commit, try to force the I/O to the DS,
> +	 * and for fun, cause all IOMODE_RW layout segments to
> +	 * effectively be WRITE only.
> +	 */
> +	fl->flags = FF_FLAGS_NO_LAYOUTCOMMIT | FF_FLAGS_NO_IO_THRU_MDS |
> +		    FF_FLAGS_NO_READ_IO;
> +
> +	fl->uid = inode->i_uid;
> +	fl->gid = inode->i_gid;
> +
> +	error = nfsd4_set_deviceid(&fl->deviceid, fhp, device_generation);
> +	if (error)
> +		goto out_error;
> +
> +	fl->fh.size = fhp->fh_handle.fh_size;
> +	memcpy(fl->fh.data, &fhp->fh_handle.fh_base, fl->fh.size);
> +
> +	/* Give whole file layout segments */
> +	seg->offset = 0;
> +	seg->length = NFS4_MAX_UINT64;
> +
> +	dprintk("GET: 0x%llx:0x%llx %d\n", seg->offset, seg->length,
> +		seg->iomode);
> +	return 0;
> +
> +out_error:
> +	kfree(fl);
> +	seg->length = 0;
> +	return nfserrno(error);
> +out_layoutunavailable:
> +	seg->length = 0;
> +	return nfserr_layoutunavailable;
> +}
> +
> +#ifdef CONFIG_NFSD_FLEXFILELAYOUT
> +static __be32
> +nfsd4_ff_proc_getdeviceinfo(struct super_block *sb,
> +		struct svc_rqst *rqstp,
> +		struct nfs4_client *clp,
> +		struct nfsd4_getdeviceinfo *gdp)
> +{
> +	struct pnfs_ff_device_addr *da;
> +
> +	u16 port;
> +	char addr[INET6_ADDRSTRLEN];
> +
> +	if (sb->s_bdev != sb->s_bdev->bd_contains)
> +		return nfserr_inval;
> +
> +	da = kzalloc(sizeof(struct pnfs_ff_device_addr), GFP_KERNEL);
> +	if (!da)
> +		return nfserrno(-ENOMEM);
> +
> +	gdp->gd_device = da;
> +
> +	da->version = 3;
> +	da->minor_version = 0;
> +
> +	/* FIXME: Get from export? */
> +	da->rsize = 4096;
> +	da->wsize = 4096;
> +

nfsd3_proc_fsinfo fills out its rsize/wsize with
svc_max_payload(rqstp). I'd suggest doing the same here.

> +	rpc_ntop((struct sockaddr *)&rqstp->rq_daddr,
> +		 addr, INET6_ADDRSTRLEN);
> +	if (rqstp->rq_daddr.ss_family == AF_INET) {
> +		struct sockaddr_in *sin;
> +
> +		sin = (struct sockaddr_in *)&rqstp->rq_daddr;
> +		port = ntohs(sin->sin_port);
> +		snprintf(da->netaddr.netid, FF_NETID_LEN + 1, "tcp");
> +		da->netaddr.netid_len = 3;
> +	} else {
> +		struct sockaddr_in6 *sin6;
> +
> +		sin6 = (struct sockaddr_in6 *)&rqstp->rq_daddr;
> +		port = ntohs(sin6->sin6_port);
> +		snprintf(da->netaddr.netid, FF_NETID_LEN + 1, "tcp6");
> +		da->netaddr.netid_len = 4;
> +	}
> +
> +	da->netaddr.addr_len =
> +		snprintf(da->netaddr.addr, FF_ADDR_LEN + 1,
> +			 "%s.%hhu.%hhu", addr, port >> 8, port & 0xff);
> +
> +	da->tightly_coupled = false;
> +
> +	return 0;
> +}
> +
> +const struct nfsd4_layout_ops ff_layout_ops = {
> +	.notify_types		=
> +			NOTIFY_DEVICEID4_DELETE | NOTIFY_DEVICEID4_CHANGE,
> +	.proc_getdeviceinfo	= nfsd4_ff_proc_getdeviceinfo,
> +	.encode_getdeviceinfo	= nfsd4_ff_encode_getdeviceinfo,
> +	.proc_layoutget		= nfsd4_ff_proc_layoutget,
> +	.encode_layoutget	= nfsd4_ff_encode_layoutget,
> +};
> +#endif /* CONFIG_NFSD_FLEXFILELAYOUT */
> diff --git a/fs/nfsd/flexfilelayoutxdr.c b/fs/nfsd/flexfilelayoutxdr.c
> new file mode 100644
> index 0000000..9d15ee0
> --- /dev/null
> +++ b/fs/nfsd/flexfilelayoutxdr.c
> @@ -0,0 +1,116 @@
> +/*
> + * Copyright (c) 2016 Tom Haynes <loghyr@primarydata.com>
> + */
> +#include 
> +#include 
> +#include 
> +
> +#include "nfsd.h"
> +#include "flexfilelayoutxdr.h"
> +
> +#define NFSDDBG_FACILITY	NFSDDBG_PNFS
> +
> +struct ff_idmap {
> +	char buf[11];
> +	int len;
> +};
> +
> +__be32
> +nfsd4_ff_encode_layoutget(struct xdr_stream *xdr,
> +		struct nfsd4_layoutget *lgp)
> +{
> +	struct pnfs_ff_layout *fl = lgp->lg_content;
> +	int len, mirror_len, ds_len, fh_len;
> +	__be32 *p;
> +
> +	/*
> +	 * Unlike nfsd4_encode_user, we know these will
> +	 * always be stringified.
> +	 */
> +	struct ff_idmap uid;
> +	struct ff_idmap gid;
> +
> +	fh_len = 4 + fl->fh.size;
> +
> +	uid.len = sprintf(uid.buf, "%u", from_kuid(&init_user_ns, fl->uid));
> +	gid.len = sprintf(gid.buf, "%u", from_kgid(&init_user_ns, fl->gid));
> +
> +	/* 8 + len for recording the length, name, and padding */
> +	ds_len = 20 + sizeof(stateid_opaque_t) + 4 + fh_len +
> +		 8 + uid.len + 8 + gid.len;
> +
> +	mirror_len = 4 + ds_len;
> +
> +	/* The layout segment */
> +	len = 20 + mirror_len;
> +
> +	p = xdr_reserve_space(xdr, sizeof(__be32) + len);
> +	if (!p)
> +		return nfserr_toosmall;
> +
> +	*p++ = cpu_to_be32(len);
> +	p = xdr_encode_hyper(p, 1);		/* stripe unit of 1 */
> +
> +	*p++ = cpu_to_be32(1);			/* single mirror */
> +	*p++ = cpu_to_be32(1);			/* single data server */
> +
> +	p = xdr_encode_opaque_fixed(p, &fl->deviceid,
> +			sizeof(struct nfsd4_deviceid));
> +
> +	*p++ = cpu_to_be32(1);			/* efficiency */
> +
> +	*p++ = cpu_to_be32(fl->stateid.si_generation);
> +	p = xdr_encode_opaque_fixed(p, &fl->stateid.si_opaque,
> +				    sizeof(stateid_opaque_t));
> +
> +	*p++ = cpu_to_be32(1);			/* single file handle */
> +	p = xdr_encode_opaque(p, fl->fh.data, fl->fh.size);
> +
> +	p = xdr_encode_opaque(p, uid.buf, uid.len);
> +	p = xdr_encode_opaque(p, gid.buf, gid.len);
> +
> +	*p++ = cpu_to_be32(fl->flags);
> +	*p++ = cpu_to_be32(0);			/* No stats collect hint */
> +
> +	return 0;
> +}
> +
> +__be32
> +nfsd4_ff_encode_getdeviceinfo(struct xdr_stream *xdr,
> +		struct nfsd4_getdeviceinfo *gdp)
> +{
> +	struct pnfs_ff_device_addr *da = gdp->gd_device;
> +	int len;
> +	int ver_len;
> +	int addr_len;
> +	__be32 *p;
> +
> +	/* len + padding for two strings */
> +	addr_len = 16 + da->netaddr.netid_len + da->netaddr.addr_len;
> +	ver_len = 20;
> +
> +	len = 4 + ver_len + 4 + addr_len;
> +
> +	p = xdr_reserve_space(xdr, len + sizeof(__be32));
> +	if (!p)
> +		return nfserr_resource;
> +
> +	/*
> +	 * Fill in the overall length and number of volumes at the beginning
> +	 * of the layout.
> +	 */
> +	*p++ = cpu_to_be32(len);
> +	*p++ = cpu_to_be32(1);			/* 1 netaddr */
> +	p = xdr_encode_opaque(p, da->netaddr.netid, da->netaddr.netid_len);
> +	p = xdr_encode_opaque(p, da->netaddr.addr, da->netaddr.addr_len);
> +
> +	*p++ = cpu_to_be32(1);			/* 1 versions */
> +
> +	*p++ = cpu_to_be32(da->version);
> +	*p++ = cpu_to_be32(da->minor_version);
> +	*p++ = cpu_to_be32(da->rsize);
> +	*p++ = cpu_to_be32(da->wsize);
> +	*p++ = cpu_to_be32(da->tightly_coupled);
> +
> +	return 0;
> +}
> diff --git a/fs/nfsd/flexfilelayoutxdr.h b/fs/nfsd/flexfilelayoutxdr.h
> new file mode 100644
> index 0000000..40e6d1b
> --- /dev/null
> +++ b/fs/nfsd/flexfilelayoutxdr.h
> @@ -0,0 +1,50 @@
> +/*
> + * Copyright (c) 2016 Tom Haynes <loghyr@primarydata.com>
> + */
> +#ifndef _NFSD_FLEXFILELAYOUTXDR_H
> +#define _NFSD_FLEXFILELAYOUTXDR_H 1
> +
> +#include 
> +#include "xdr4.h"
> +
> +#define FF_FLAGS_NO_LAYOUTCOMMIT 1
> +#define FF_FLAGS_NO_IO_THRU_MDS  2
> +#define FF_FLAGS_NO_READ_IO      4
> +
> +struct iomap;
> +struct xdr_stream;
> +
> +#define FF_NETID_LEN		(4)
> +#define FF_ADDR_LEN		(INET6_ADDRSTRLEN + 1)
> +struct pnfs_ff_netaddr {
> +	char				netid[FF_NETID_LEN + 1];
> +	char				addr[FF_ADDR_LEN + 1];
> +	u32				netid_len;
> +	u32				addr_len;
> +};
> +
> +struct pnfs_ff_device_addr {
> +	struct pnfs_ff_netaddr		netaddr;
> +	u32				version;
> +	u32				minor_version;
> +	u32				rsize;
> +	u32				wsize;
> +	bool				tightly_coupled;
> +};
> +
> +struct pnfs_ff_layout {
> +	u32				flags;
> +	u32				stats_collect_hint;
> +	kuid_t				uid;
> +	kgid_t				gid;
> +	struct nfsd4_deviceid		deviceid;
> +	stateid_t			stateid;
> +	struct nfs_fh			fh;
> +};
> +
> +__be32 nfsd4_ff_encode_getdeviceinfo(struct xdr_stream *xdr,
> +		struct nfsd4_getdeviceinfo *gdp);
> +__be32 nfsd4_ff_encode_layoutget(struct xdr_stream *xdr,
> +		struct nfsd4_layoutget *lgp);
> +
> +#endif /* _NFSD_FLEXFILELAYOUTXDR_H */
> diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
> index 825c7bc..7cbd56a 100644
> --- a/fs/nfsd/nfs4layouts.c
> +++ b/fs/nfsd/nfs4layouts.c
> @@ -27,6 +27,9 @@ static const struct nfsd4_callback_ops nfsd4_cb_layout_ops;
>  static const struct lock_manager_operations nfsd4_layouts_lm_ops;
>  
>  const struct nfsd4_layout_ops *nfsd4_layout_ops[LAYOUT_TYPE_MAX] =  {
> +#ifdef CONFIG_NFSD_FLEXFILELAYOUT
> +	[LAYOUT_FLEX_FILES]	= &ff_layout_ops,
> +#endif
>  #ifdef CONFIG_NFSD_BLOCKLAYOUT
>  	[LAYOUT_BLOCK_VOLUME]	= &bl_layout_ops,
>  #endif
> @@ -122,7 +125,9 @@ nfsd4_set_deviceid(struct nfsd4_deviceid *id, const struct svc_fh *fhp,
>  
>  void nfsd4_setup_layout_type(struct svc_export *exp)
>  {
> +#if defined(CONFIG_NFSD_BLOCKLAYOUT) || defined(CONFIG_NFSD_SCSILAYOUT)
>  	struct super_block *sb = exp->ex_path.mnt->mnt_sb;
> +#endif
>  
>  	if (!(exp->ex_flags & NFSEXP_PNFS))
>  		return;
> @@ -145,6 +150,11 @@ void nfsd4_setup_layout_type(struct svc_export *exp)
>  	    sb->s_bdev && sb->s_bdev->bd_disk->fops->pr_ops)
>  		exp->ex_layout_type = LAYOUT_SCSI;
>  #endif
> +#ifdef CONFIG_NFSD_FLEXFILELAYOUT
> +	// FIXME: How do we "export" this and how does it mingle with
> +	// the above types?
> +	exp->ex_layout_type = LAYOUT_FLEX_FILES;
> +#endif
>  }
>  
>  static void
> diff --git a/fs/nfsd/pnfs.h b/fs/nfsd/pnfs.h
> index e855677..0c2a716 100644
> --- a/fs/nfsd/pnfs.h
> +++ b/fs/nfsd/pnfs.h
> @@ -45,6 +45,9 @@ extern const struct nfsd4_layout_ops bl_layout_ops;
>  #ifdef CONFIG_NFSD_SCSILAYOUT
>  extern const struct nfsd4_layout_ops scsi_layout_ops;
>  #endif
> +#ifdef CONFIG_NFSD_FLEXFILELAYOUT
> +extern const struct nfsd4_layout_ops ff_layout_ops;
> +#endif
>  
>  __be32 nfsd4_preprocess_layout_stateid(struct svc_rqst *rqstp,
>  		struct nfsd4_compound_state *cstate, stateid_t *stateid,
-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] nfsd: Add a super simple flex file server
  2016-05-25  5:09 ` [PATCH 3/4] nfsd: Add a super simple flex file server Tom Haynes
  2016-05-25 12:00   ` Jeff Layton
@ 2016-05-25 12:30   ` Jeff Layton
  2016-05-25 14:41     ` Thomas Haynes
  2016-05-25 17:42     ` J. Bruce Fields
  2016-05-25 15:15   ` Christoph Hellwig
  2 siblings, 2 replies; 22+ messages in thread
From: Jeff Layton @ 2016-05-25 12:30 UTC (permalink / raw)
  To: Tom Haynes, J. Bruce Fields; +Cc: Linux NFS Mailing list, Christoph Hellwig

On Tue, 2016-05-24 at 22:09 -0700, Tom Haynes wrote:
> Have a simple flex file server where the mds (NFSv4.1 or NFSv4.2)
> is also the ds (NFSv3). I.e., the metadata and the data file are
> the exact same file.
> 
> This will allow testing of the flex file client.
> 
> Simply add the "pnfs" export option to your export
> in /etc/exports and mount from a client that supports
> flex files.
> 
> Signed-off-by: Tom Haynes <loghyr@primarydata.com>
> ---
>  fs/nfsd/Makefile            |   1 +
>  fs/nfsd/flexfilelayout.c    | 148 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/nfsd/flexfilelayoutxdr.c | 116 ++++++++++++++++++++++++++++++++++
>  fs/nfsd/flexfilelayoutxdr.h |  50 +++++++++++++++
>  fs/nfsd/nfs4layouts.c       |  10 +++
>  fs/nfsd/pnfs.h              |   3 +
>  6 files changed, 328 insertions(+)
>  create mode 100644 fs/nfsd/flexfilelayout.c
>  create mode 100644 fs/nfsd/flexfilelayoutxdr.c
>  create mode 100644 fs/nfsd/flexfilelayoutxdr.h
> 
> diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
> index 3ae5f3c..5f5d3a7 100644
> --- a/fs/nfsd/Makefile
> +++ b/fs/nfsd/Makefile
> @@ -20,3 +20,4 @@ nfsd-$(CONFIG_NFSD_V4)	+= nfs4proc.o nfs4xdr.o nfs4state.o nfs4idmap.o \
>  nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
>  nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
>  nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
> +nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
> diff --git a/fs/nfsd/flexfilelayout.c b/fs/nfsd/flexfilelayout.c
> new file mode 100644
> index 0000000..d28b8a0
> --- /dev/null
> +++ b/fs/nfsd/flexfilelayout.c
> @@ -0,0 +1,148 @@
> +/*
> + * Copyright (c) 2016 Tom Haynes <loghyr@primarydata.com>
> + *
> + * The following implements a super-simple flex-file server
> + * where the NFSv4.1 mds is also the ds. And the storage is
> + * the same. I.e., writing to the mds via a NFSv4.1 WRITE
> + * goes to the same location as the NFSv3 WRITE.
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#include 
> +
> +#include "flexfilelayoutxdr.h"
> +#include "pnfs.h"
> +
> +#define NFSDDBG_FACILITY	NFSDDBG_PNFS
> +
> +static __be32
> +nfsd4_ff_proc_layoutget(struct inode *inode, const struct svc_fh *fhp,
> +		struct nfsd4_layoutget *args)
> +{
> +	struct nfsd4_layout_seg *seg = &args->lg_seg;
> +	u32 block_size = (1 << inode->i_blkbits);
> +	u32 device_generation = 0;
> +	int error;
> +
> +	struct pnfs_ff_layout *fl;
> +
> +	if (seg->offset & (block_size - 1)) {
> +		dprintk("pnfsd: I/O misaligned\n");
> +		goto out_layoutunavailable;
> +	}
> +
> +	/*
> +	 * The super simple flex file server has 1 mirror, 1 data server,
> +	 * and 1 file handle. So instead of 4 allocs, do 1 for now.
> +	 * Zero it out for the stateid - don't want junk in there!
> +	 */
> +	error = -ENOMEM;
> +	fl = kzalloc(sizeof(*fl), GFP_KERNEL);
> +	if (!fl)
> +		goto out_error;
> +	args->lg_content = fl;
> +
> +	/*
> +	 * Avoid layout commit, try to force the I/O to the DS,
> +	 * and for fun, cause all IOMODE_RW layout segments to
> +	 * effectively be WRITE only.
> +	 */
> +	fl->flags = FF_FLAGS_NO_LAYOUTCOMMIT | FF_FLAGS_NO_IO_THRU_MDS |
> +		    FF_FLAGS_NO_READ_IO;
> +
> +	fl->uid = inode->i_uid;
> +	fl->gid = inode->i_gid;
> +
> +	error = nfsd4_set_deviceid(&fl->deviceid, fhp, device_generation);
> +	if (error)
> +		goto out_error;
> +
> +	fl->fh.size = fhp->fh_handle.fh_size;
> +	memcpy(fl->fh.data, &fhp->fh_handle.fh_base, fl->fh.size);
> +
> +	/* Give whole file layout segments */
> +	seg->offset = 0;
> +	seg->length = NFS4_MAX_UINT64;
> +
> +	dprintk("GET: 0x%llx:0x%llx %d\n", seg->offset, seg->length,
> +		seg->iomode);
> +	return 0;
> +
> +out_error:
> +	kfree(fl);
> +	seg->length = 0;
> +	return nfserrno(error);
> +out_layoutunavailable:
> +	seg->length = 0;
> +	return nfserr_layoutunavailable;
> +}
> +
> +#ifdef CONFIG_NFSD_FLEXFILELAYOUT
> +static __be32
> +nfsd4_ff_proc_getdeviceinfo(struct super_block *sb,
> +		struct svc_rqst *rqstp,
> +		struct nfs4_client *clp,
> +		struct nfsd4_getdeviceinfo *gdp)
> +{
> +	struct pnfs_ff_device_addr *da;
> +
> +	u16 port;
> +	char addr[INET6_ADDRSTRLEN];
> +
> +	if (sb->s_bdev != sb->s_bdev->bd_contains)
> +		return nfserr_inval;
> +
> +	da = kzalloc(sizeof(struct pnfs_ff_device_addr), GFP_KERNEL);
> +	if (!da)
> +		return nfserrno(-ENOMEM);
> +
> +	gdp->gd_device = da;
> +
> +	da->version = 3;
> +	da->minor_version = 0;
> +
> +	/* FIXME: Get from export? */
> +	da->rsize = 4096;
> +	da->wsize = 4096;
> +
> +	rpc_ntop((struct sockaddr *)&rqstp->rq_daddr,
> +		 addr, INET6_ADDRSTRLEN);
> +	if (rqstp->rq_daddr.ss_family == AF_INET) {
> +		struct sockaddr_in *sin;
> +
> +		sin = (struct sockaddr_in *)&rqstp->rq_daddr;
> +		port = ntohs(sin->sin_port);
> +		snprintf(da->netaddr.netid, FF_NETID_LEN + 1, "tcp");
> +		da->netaddr.netid_len = 3;
> +	} else {
> +		struct sockaddr_in6 *sin6;
> +
> +		sin6 = (struct sockaddr_in6 *)&rqstp->rq_daddr;
> +		port = ntohs(sin6->sin6_port);
> +		snprintf(da->netaddr.netid, FF_NETID_LEN + 1, "tcp6");
> +		da->netaddr.netid_len = 4;
> +	}
> +
> +	da->netaddr.addr_len =
> +		snprintf(da->netaddr.addr, FF_ADDR_LEN + 1,
> +			 "%s.%hhu.%hhu", addr, port >> 8, port & 0xff);
> +
> +	da->tightly_coupled = false;
> +
> +	return 0;
> +}
> +
> +const struct nfsd4_layout_ops ff_layout_ops = {
> +	.notify_types		=
> +			NOTIFY_DEVICEID4_DELETE | NOTIFY_DEVICEID4_CHANGE,
> +	.proc_getdeviceinfo	= nfsd4_ff_proc_getdeviceinfo,
> +	.encode_getdeviceinfo	= nfsd4_ff_encode_getdeviceinfo,
> +	.proc_layoutget		= nfsd4_ff_proc_layoutget,
> +	.encode_layoutget	= nfsd4_ff_encode_layoutget,
> +};
> +#endif /* CONFIG_NFSD_FLEXFILELAYOUT */
> diff --git a/fs/nfsd/flexfilelayoutxdr.c b/fs/nfsd/flexfilelayoutxdr.c
> new file mode 100644
> index 0000000..9d15ee0
> --- /dev/null
> +++ b/fs/nfsd/flexfilelayoutxdr.c
> @@ -0,0 +1,116 @@
> +/*
> + * Copyright (c) 2016 Tom Haynes <loghyr@primarydata.com>
> + */
> +#include 
> +#include 
> +#include 
> +
> +#include "nfsd.h"
> +#include "flexfilelayoutxdr.h"
> +
> +#define NFSDDBG_FACILITY	NFSDDBG_PNFS
> +
> +struct ff_idmap {
> +	char buf[11];
> +	int len;
> +};
> +
> +__be32
> +nfsd4_ff_encode_layoutget(struct xdr_stream *xdr,
> +		struct nfsd4_layoutget *lgp)
> +{
> +	struct pnfs_ff_layout *fl = lgp->lg_content;
> +	int len, mirror_len, ds_len, fh_len;
> +	__be32 *p;
> +
> +	/*
> +	 * Unlike nfsd4_encode_user, we know these will
> +	 * always be stringified.
> +	 */
> +	struct ff_idmap uid;
> +	struct ff_idmap gid;
> +
> +	fh_len = 4 + fl->fh.size;
> +
> +	uid.len = sprintf(uid.buf, "%u", from_kuid(&init_user_ns, fl->uid));
> +	gid.len = sprintf(gid.buf, "%u", from_kgid(&init_user_ns, fl->gid));
> +
> +	/* 8 + len for recording the length, name, and padding */
> +	ds_len = 20 + sizeof(stateid_opaque_t) + 4 + fh_len +
> +		 8 + uid.len + 8 + gid.len;
> +
> +	mirror_len = 4 + ds_len;
> +
> +	/* The layout segment */
> +	len = 20 + mirror_len;
> +
> +	p = xdr_reserve_space(xdr, sizeof(__be32) + len);
> +	if (!p)
> +		return nfserr_toosmall;
> +
> +	*p++ = cpu_to_be32(len);
> +	p = xdr_encode_hyper(p, 1);		/* stripe unit of 1 */
> +
> +	*p++ = cpu_to_be32(1);			/* single mirror */
> +	*p++ = cpu_to_be32(1);			/* single data server */
> +
> +	p = xdr_encode_opaque_fixed(p, &fl->deviceid,
> +			sizeof(struct nfsd4_deviceid));
> +
> +	*p++ = cpu_to_be32(1);			/* efficiency */
> +
> +	*p++ = cpu_to_be32(fl->stateid.si_generation);
> +	p = xdr_encode_opaque_fixed(p, &fl->stateid.si_opaque,
> +				    sizeof(stateid_opaque_t));
> +
> +	*p++ = cpu_to_be32(1);			/* single file handle */
> +	p = xdr_encode_opaque(p, fl->fh.data, fl->fh.size);
> +
> +	p = xdr_encode_opaque(p, uid.buf, uid.len);
> +	p = xdr_encode_opaque(p, gid.buf, gid.len);
> +
> +	*p++ = cpu_to_be32(fl->flags);
> +	*p++ = cpu_to_be32(0);			/* No stats collect hint */
> +
> +	return 0;
> +}
> +
> +__be32
> +nfsd4_ff_encode_getdeviceinfo(struct xdr_stream *xdr,
> +		struct nfsd4_getdeviceinfo *gdp)
> +{
> +	struct pnfs_ff_device_addr *da = gdp->gd_device;
> +	int len;
> +	int ver_len;
> +	int addr_len;
> +	__be32 *p;
> +
> +	/* len + padding for two strings */
> +	addr_len = 16 + da->netaddr.netid_len + da->netaddr.addr_len;
> +	ver_len = 20;
> +
> +	len = 4 + ver_len + 4 + addr_len;
> +
> +	p = xdr_reserve_space(xdr, len + sizeof(__be32));
> +	if (!p)
> +		return nfserr_resource;
> +
> +	/*
> +	 * Fill in the overall length and number of volumes at the beginning
> +	 * of the layout.
> +	 */
> +	*p++ = cpu_to_be32(len);
> +	*p++ = cpu_to_be32(1);			/* 1 netaddr */
> +	p = xdr_encode_opaque(p, da->netaddr.netid, da->netaddr.netid_len);
> +	p = xdr_encode_opaque(p, da->netaddr.addr, da->netaddr.addr_len);
> +
> +	*p++ = cpu_to_be32(1);			/* 1 versions */
> +
> +	*p++ = cpu_to_be32(da->version);
> +	*p++ = cpu_to_be32(da->minor_version);
> +	*p++ = cpu_to_be32(da->rsize);
> +	*p++ = cpu_to_be32(da->wsize);
> +	*p++ = cpu_to_be32(da->tightly_coupled);
> +
> +	return 0;
> +}
> diff --git a/fs/nfsd/flexfilelayoutxdr.h b/fs/nfsd/flexfilelayoutxdr.h
> new file mode 100644
> index 0000000..40e6d1b
> --- /dev/null
> +++ b/fs/nfsd/flexfilelayoutxdr.h
> @@ -0,0 +1,50 @@
> +/*
> + * Copyright (c) 2016 Tom Haynes <loghyr@primarydata.com>
> + */
> +#ifndef _NFSD_FLEXFILELAYOUTXDR_H
> +#define _NFSD_FLEXFILELAYOUTXDR_H 1
> +
> +#include 
> +#include "xdr4.h"
> +
> +#define FF_FLAGS_NO_LAYOUTCOMMIT 1
> +#define FF_FLAGS_NO_IO_THRU_MDS  2
> +#define FF_FLAGS_NO_READ_IO      4
> +
> +struct iomap;
> +struct xdr_stream;
> +
> +#define FF_NETID_LEN		(4)
> +#define FF_ADDR_LEN		(INET6_ADDRSTRLEN + 1)
> +struct pnfs_ff_netaddr {
> +	char				netid[FF_NETID_LEN + 1];
> +	char				addr[FF_ADDR_LEN + 1];
> +	u32				netid_len;
> +	u32				addr_len;
> +};
> +
> +struct pnfs_ff_device_addr {
> +	struct pnfs_ff_netaddr		netaddr;
> +	u32				version;
> +	u32				minor_version;
> +	u32				rsize;
> +	u32				wsize;
> +	bool				tightly_coupled;
> +};
> +
> +struct pnfs_ff_layout {
> +	u32				flags;
> +	u32				stats_collect_hint;
> +	kuid_t				uid;
> +	kgid_t				gid;
> +	struct nfsd4_deviceid		deviceid;
> +	stateid_t			stateid;
> +	struct nfs_fh			fh;
> +};
> +
> +__be32 nfsd4_ff_encode_getdeviceinfo(struct xdr_stream *xdr,
> +		struct nfsd4_getdeviceinfo *gdp);
> +__be32 nfsd4_ff_encode_layoutget(struct xdr_stream *xdr,
> +		struct nfsd4_layoutget *lgp);
> +
> +#endif /* _NFSD_FLEXFILELAYOUTXDR_H */
> diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
> index 825c7bc..7cbd56a 100644
> --- a/fs/nfsd/nfs4layouts.c
> +++ b/fs/nfsd/nfs4layouts.c
> @@ -27,6 +27,9 @@ static const struct nfsd4_callback_ops nfsd4_cb_layout_ops;
>  static const struct lock_manager_operations nfsd4_layouts_lm_ops;
>  
>  const struct nfsd4_layout_ops *nfsd4_layout_ops[LAYOUT_TYPE_MAX] =  {
> +#ifdef CONFIG_NFSD_FLEXFILELAYOUT
> +	[LAYOUT_FLEX_FILES]	= &ff_layout_ops,
> +#endif
>  #ifdef CONFIG_NFSD_BLOCKLAYOUT
>  	[LAYOUT_BLOCK_VOLUME]	= &bl_layout_ops,
>  #endif
> @@ -122,7 +125,9 @@ nfsd4_set_deviceid(struct nfsd4_deviceid *id, const struct svc_fh *fhp,
>  
>  void nfsd4_setup_layout_type(struct svc_export *exp)
>  {
> +#if defined(CONFIG_NFSD_BLOCKLAYOUT) || defined(CONFIG_NFSD_SCSILAYOUT)
>  	struct super_block *sb = exp->ex_path.mnt->mnt_sb;
> +#endif
>  
>  	if (!(exp->ex_flags & NFSEXP_PNFS))
>  		return;
> @@ -145,6 +150,11 @@ void nfsd4_setup_layout_type(struct svc_export *exp)
>  	    sb->s_bdev && sb->s_bdev->bd_disk->fops->pr_ops)
>  		exp->ex_layout_type = LAYOUT_SCSI;
>  #endif
> +#ifdef CONFIG_NFSD_FLEXFILELAYOUT
> +	// FIXME: How do we "export" this and how does it mingle with
> +	// the above types?
> +	exp->ex_layout_type = LAYOUT_FLEX_FILES;
> +#endif
>  }
>  

Maybe it's time to start thinking about how to support multiple layout types per export? It doesn't look like it would be that hard. I think we could convert ex_layout_type into a bitmap that shows which types are supported.

The harder work looks to be on the client. You'd need some heuristic to choose when you get back multiple layout types and fix that to work properly.


>  static void
> diff --git a/fs/nfsd/pnfs.h b/fs/nfsd/pnfs.h
> index e855677..0c2a716 100644
> --- a/fs/nfsd/pnfs.h
> +++ b/fs/nfsd/pnfs.h
> @@ -45,6 +45,9 @@ extern const struct nfsd4_layout_ops bl_layout_ops;
>  #ifdef CONFIG_NFSD_SCSILAYOUT
>  extern const struct nfsd4_layout_ops scsi_layout_ops;
>  #endif
> +#ifdef CONFIG_NFSD_FLEXFILELAYOUT
> +extern const struct nfsd4_layout_ops ff_layout_ops;
> +#endif
>  
>  __be32 nfsd4_preprocess_layout_stateid(struct svc_rqst *rqstp,
>  		struct nfsd4_compound_state *cstate, stateid_t *stateid,


-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] nfsd: Add a super simple flex file server
  2016-05-25 12:30   ` Jeff Layton
@ 2016-05-25 14:41     ` Thomas Haynes
  2016-05-25 17:42     ` J. Bruce Fields
  1 sibling, 0 replies; 22+ messages in thread
From: Thomas Haynes @ 2016-05-25 14:41 UTC (permalink / raw)
  To: Jeff Layton; +Cc: J. Bruce Fields, Linux NFS Mailing list, hch


> On May 25, 2016, at 5:30 AM, Jeff Layton <jlayton@poochiereds.net> wrote:
> 
> On Tue, 2016-05-24 at 22:09 -0700, Tom Haynes wrote:
>> 
>>  void nfsd4_setup_layout_type(struct svc_export *exp)
>>  {
>> +#if defined(CONFIG_NFSD_BLOCKLAYOUT) || defined(CONFIG_NFSD_SCSILAYOUT)
>>  	struct super_block *sb = exp->ex_path.mnt->mnt_sb;
>> +#endif
>>  
>>  	if (!(exp->ex_flags & NFSEXP_PNFS))
>>  		return;
>> @@ -145,6 +150,11 @@ void nfsd4_setup_layout_type(struct svc_export *exp)
>>  	    sb->s_bdev && sb->s_bdev->bd_disk->fops->pr_ops)
>>  		exp->ex_layout_type = LAYOUT_SCSI;
>>  #endif
>> +#ifdef CONFIG_NFSD_FLEXFILELAYOUT
>> +	// FIXME: How do we "export" this and how does it mingle with
>> +	// the above types?
>> +	exp->ex_layout_type = LAYOUT_FLEX_FILES;
>> +#endif
>>  }
>>  
> 
> Maybe it's time to start thinking about how to support multiple layout types per export? It doesn't look like it would be that hard. I think we could convert ex_layout_type into a bitmap that shows which types are supported.
> 
> The harder work looks to be on the client. You'd need some heuristic to choose when you get back multiple layout types and fix that to work properly.


In thinking about it, if we rearrange the code to be:

void nfsd4_setup_layout_type(struct svc_export *exp)
{
#if defined(CONFIG_NFSD_BLOCKLAYOUT) || defined(CONFIG_NFSD_SCSILAYOUT)
        struct super_block *sb = exp->ex_path.mnt->mnt_sb;
#endif
        
        if (!(exp->ex_flags & NFSEXP_PNFS))
                return;
        
        /*
         * If flex file is configured, use it by default. Otherwise
         * check if the file system supports exporting a block-like layout.
         * If the block device supports reservations prefer the SCSI layout,
         * otherwise advertise the block layout.
         */
#ifdef CONFIG_NFSD_FLEXFILELAYOUT
        // FIXME: How do we "export" this and how does it mingle with
        // the above types? 
        exp->ex_layout_type = LAYOUT_FLEX_FILES;
#endif
#ifdef CONFIG_NFSD_BLOCKLAYOUT
        /* overwrite flex file layout selection if needed */
        if (sb->s_export_op->get_uuid &&
            sb->s_export_op->map_blocks &&
            sb->s_export_op->commit_blocks)
                exp->ex_layout_type = LAYOUT_BLOCK_VOLUME;
#endif
#ifdef CONFIG_NFSD_SCSILAYOUT
        /* overwrite block layout selection if needed */
        if (sb->s_export_op->map_blocks && 
            sb->s_export_op->commit_blocks &&
            sb->s_bdev && sb->s_bdev->bd_disk->fops->pr_ops)
                exp->ex_layout_type = LAYOUT_SCSI;
#endif
}

We get what seems natural.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] nfsd: Can leak pnfs_block_extent on error
  2016-05-25  5:09 ` [PATCH 2/4] nfsd: Can leak pnfs_block_extent on error Tom Haynes
  2016-05-25 11:50   ` Jeff Layton
@ 2016-05-25 15:07   ` Christoph Hellwig
  2016-05-25 18:12     ` Thomas Haynes
  1 sibling, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2016-05-25 15:07 UTC (permalink / raw)
  To: Tom Haynes; +Cc: J. Bruce Fields, Linux NFS Mailing list, Christoph Hellwig

On Tue, May 24, 2016 at 10:09:37PM -0700, Tom Haynes wrote:
> Signed-off-by: Tom Haynes <loghyr@primarydata.com>

How was this reported?

Like other NFS procedures the private data should be freed by the
XDR encode callback (nfsd4_encode_layoutget in this case) even
in the error case.  It could be that there is a bug somewhere,
but it probably shouldn't be fixed here.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] nfsd: flex file device id encoding will need the server addres
  2016-05-25  5:09 ` [PATCH 1/4] nfsd: flex file device id encoding will need the server addres Tom Haynes
  2016-05-25 11:49   ` Jeff Layton
@ 2016-05-25 15:08   ` Christoph Hellwig
  1 sibling, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2016-05-25 15:08 UTC (permalink / raw)
  To: Tom Haynes; +Cc: J. Bruce Fields, Linux NFS Mailing list, Christoph Hellwig

> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index de1ff1d..b28e45b 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1270,6 +1270,7 @@ nfsd4_getdeviceinfo(struct svc_rqst *rqstp,
>  	nfserr = nfs_ok;
>  	if (gdp->gd_maxcount != 0) {
>  		nfserr = ops->proc_getdeviceinfo(exp->ex_path.mnt->mnt_sb,
> +					rqstp,
>  					cstate->session->se_client, gdp);

Can you reindent the code so that the cstate->session->se_client argument
goes onto the same line as rqstp?

Otherwise this looks fine to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] nfsd: Provide a config option for flex file layouts
  2016-05-25  5:09 ` [PATCH 4/4] nfsd: Provide a config option for flex file layouts Tom Haynes
@ 2016-05-25 15:09   ` Christoph Hellwig
  2016-05-25 18:19     ` Thomas Haynes
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2016-05-25 15:09 UTC (permalink / raw)
  To: Tom Haynes; +Cc: J. Bruce Fields, Linux NFS Mailing list, Christoph Hellwig

> +config NFSD_FLEXFILELAYOUT
> +	bool "NFSv4.1 server support for pNFS Flex File layouts"
> +	depends on NFSD_V4
> +	select NFSD_PNFS
> +	help
> +	  This option enables support for the exporting pNFS Flex File
> +	  layouts in the kernel's NFS server. The pNFS Flex File  layout
> +	  enables NFS clients to directly perform I/O to NFSv3 devices
> +	  accesible to both the server and the clients.  See
> +	  draft-ietf-nfsv4-flex-files for more details.

How about a bit more of a warning that this is just a toy demo server?

Also I'd say merge this into the previous patch.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] nfsd: Add a super simple flex file server
  2016-05-25  5:09 ` [PATCH 3/4] nfsd: Add a super simple flex file server Tom Haynes
  2016-05-25 12:00   ` Jeff Layton
  2016-05-25 12:30   ` Jeff Layton
@ 2016-05-25 15:15   ` Christoph Hellwig
  2016-05-26  5:37     ` Thomas Haynes
  2 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2016-05-25 15:15 UTC (permalink / raw)
  To: Tom Haynes; +Cc: J. Bruce Fields, Linux NFS Mailing list, Christoph Hellwig

Nice!  A few comments below:

> + * where the NFSv4.1 mds is also the ds. And the storage is
> + * the same. I.e., writing to the mds via a NFSv4.1 WRITE
> + * goes to the same location as the NFSv3 WRITE.
> + */
> +#include <linux/exportfs.h>
> +#include <linux/genhd.h>

> +#include <linux/pr.h>

I don't think you need any of the three headers above.

> +static __be32
> +nfsd4_ff_proc_layoutget(struct inode *inode, const struct svc_fh *fhp,
> +		struct nfsd4_layoutget *args)
> +{
> +	struct nfsd4_layout_seg *seg = &args->lg_seg;
> +	u32 block_size = (1 << inode->i_blkbits);
> +	u32 device_generation = 0;
> +	int error;
> +
> +	struct pnfs_ff_layout *fl;
> +
> +	if (seg->offset & (block_size - 1)) {
> +		dprintk("pnfsd: I/O misaligned\n");
> +		goto out_layoutunavailable;
> +	}

Do we really care about aligned I/O for flexfiles layouts?

> +	 * effectively be WRITE only.
> +	 */
> +	fl->flags = FF_FLAGS_NO_LAYOUTCOMMIT | FF_FLAGS_NO_IO_THRU_MDS |
> +		    FF_FLAGS_NO_READ_IO;
> +
> +	fl->uid = inode->i_uid;
> +	fl->gid = inode->i_gid;

Maybe I need to actually read the latest draft, but what's the story
about these on the wire uids/gids?

> +#ifdef CONFIG_NFSD_FLEXFILELAYOUT

I don't think you need this - the whole file is conditional on this
symbol.

> +	if (sb->s_bdev != sb->s_bdev->bd_contains)
> +		return nfserr_inval;

Shouldn't be needed.

> +#include <linux/exportfs.h>

probably not needed.

> +struct iomap;

no needed.

>  void nfsd4_setup_layout_type(struct svc_export *exp)
>  {
> +#if defined(CONFIG_NFSD_BLOCKLAYOUT) || defined(CONFIG_NFSD_SCSILAYOUT)
>  	struct super_block *sb = exp->ex_path.mnt->mnt_sb;
> +#endif
>  
>  	if (!(exp->ex_flags & NFSEXP_PNFS))
>  		return;
> @@ -145,6 +150,11 @@ void nfsd4_setup_layout_type(struct svc_export *exp)
>  	    sb->s_bdev && sb->s_bdev->bd_disk->fops->pr_ops)
>  		exp->ex_layout_type = LAYOUT_SCSI;
>  #endif
> +#ifdef CONFIG_NFSD_FLEXFILELAYOUT
> +	// FIXME: How do we "export" this and how does it mingle with
> +	// the above types?
> +	exp->ex_layout_type = LAYOUT_FLEX_FILES;
> +#endif

As pointed out by Jeff we'll probably need a bitmap of supported layouts
here.  Something like

	unsigned long ex_layout_types;

...

	if (supported)
		ex_layout_types |= (1 << LAYOUT_XXX)

probably best done as a separate preparation patch.

The other issue is that the Linux client is currently confused when
more than a single layout type is supported - we'll need some sort
of runtime option to chose the layout(s) supported.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] nfsd: Add a super simple flex file server
  2016-05-25 12:30   ` Jeff Layton
  2016-05-25 14:41     ` Thomas Haynes
@ 2016-05-25 17:42     ` J. Bruce Fields
  2016-05-25 21:57       ` Jeff Layton
  1 sibling, 1 reply; 22+ messages in thread
From: J. Bruce Fields @ 2016-05-25 17:42 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Tom Haynes, Linux NFS Mailing list, Christoph Hellwig

On Wed, May 25, 2016 at 08:30:44AM -0400, Jeff Layton wrote:
> Maybe it's time to start thinking about how to support multiple layout
> types per export?

It looks like nobody would want this flex file code in production.  The
only users will be testers and developers.

And the scsi layout is really just a replacement for the block layout,
nobody should be supporting both of those at once either.

Would it be too much of a burden just to make flexfiles and developers
build their own kernels?

> It doesn't look like it would be that hard. I think
> we could convert ex_layout_type into a bitmap that shows which types
> are supported.

ex_layout_type is only used internally, the only external interface is
an export flag, so we'd need some new interface.

--b.

> The harder work looks to be on the client. You'd need some heuristic to choose when you get back multiple layout types and fix that to work properly.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] nfsd: Can leak pnfs_block_extent on error
  2016-05-25 15:07   ` Christoph Hellwig
@ 2016-05-25 18:12     ` Thomas Haynes
  2016-05-25 18:20       ` J. Bruce Fields
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Haynes @ 2016-05-25 18:12 UTC (permalink / raw)
  To: hch; +Cc: Thomas Haynes, J. Bruce Fields, Linux NFS Mailing list


> On May 25, 2016, at 8:07 AM, Christoph Hellwig <hch@lst.de> wrote:
> 
> On Tue, May 24, 2016 at 10:09:37PM -0700, Tom Haynes wrote:
>> Signed-off-by: Tom Haynes <loghyr@primarydata.com>
> 
> How was this reported?

Code inspection. My guess is no one ever hit the error cases
in there.

> 
> Like other NFS procedures the private data should be freed by the
> XDR encode callback (nfsd4_encode_layoutget in this case) even
> in the error case.  It could be that there is a bug somewhere,
> but it probably shouldn't be fixed here.
> 

No, it doesn’t do that on errors:

nfsd4_layoutget():

       nfserr = ops->proc_layoutget(d_inode(current_fh->fh_dentry),
                                     current_fh, lgp);
        if (nfserr)
                goto out_put_stid;

        nfserr = nfsd4_insert_layout(lgp, ls);

out_put_stid:
        mutex_unlock(&ls->ls_mutex);
        nfs4_put_stid(&ls->ls_stid);
out:
        return nfserr;
}

So on error we never do anything with the lgp and the memory would
be dropped.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] nfsd: Provide a config option for flex file layouts
  2016-05-25 15:09   ` Christoph Hellwig
@ 2016-05-25 18:19     ` Thomas Haynes
  2016-05-25 18:21       ` J. Bruce Fields
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Haynes @ 2016-05-25 18:19 UTC (permalink / raw)
  To: hch; +Cc: Thomas Haynes, J. Bruce Fields, Linux NFS Mailing list


> On May 25, 2016, at 8:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> 
>> +config NFSD_FLEXFILELAYOUT
>> +	bool "NFSv4.1 server support for pNFS Flex File layouts"
>> +	depends on NFSD_V4
>> +	select NFSD_PNFS
>> +	help
>> +	  This option enables support for the exporting pNFS Flex File
>> +	  layouts in the kernel's NFS server. The pNFS Flex File  layout
>> +	  enables NFS clients to directly perform I/O to NFSv3 devices
>> +	  accesible to both the server and the clients.  See
>> +	  draft-ietf-nfsv4-flex-files for more details.
> 
> How about a bit more of a warning that this is just a toy demo server?
> 

Add:

          Warning, this server implements the bare minimum functionality for
          to be a flex file server - it is more for testing the client
          than for a production server.




> Also I'd say merge this into the previous patch.
> 

Agreed


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] nfsd: Can leak pnfs_block_extent on error
  2016-05-25 18:12     ` Thomas Haynes
@ 2016-05-25 18:20       ` J. Bruce Fields
  0 siblings, 0 replies; 22+ messages in thread
From: J. Bruce Fields @ 2016-05-25 18:20 UTC (permalink / raw)
  To: Thomas Haynes; +Cc: hch, Linux NFS Mailing list

On Wed, May 25, 2016 at 06:12:25PM +0000, Thomas Haynes wrote:
> 
> > On May 25, 2016, at 8:07 AM, Christoph Hellwig <hch@lst.de> wrote:
> > 
> > On Tue, May 24, 2016 at 10:09:37PM -0700, Tom Haynes wrote:
> >> Signed-off-by: Tom Haynes <loghyr@primarydata.com>
> > 
> > How was this reported?
> 
> Code inspection. My guess is no one ever hit the error cases
> in there.
> 
> > 
> > Like other NFS procedures the private data should be freed by the
> > XDR encode callback (nfsd4_encode_layoutget in this case) even
> > in the error case.  It could be that there is a bug somewhere,
> > but it probably shouldn't be fixed here.
> > 
> 
> No, it doesn’t do that on errors:

We have in nfsd4_block_proc_layoutget:

	bex = kzalloc(sizeof(*bex), GFP_KERNEL);
	if (!bex)
		goto out_error;
	args->lg_content = bex;

and then in nfsd4_encode_layoutget:

	kfree(lgp->lg_content);

So, I think we're OK as is?

--b.


> 
> nfsd4_layoutget():
> 
>        nfserr = ops->proc_layoutget(d_inode(current_fh->fh_dentry),
>                                      current_fh, lgp);
>         if (nfserr)
>                 goto out_put_stid;
> 
>         nfserr = nfsd4_insert_layout(lgp, ls);
> 
> out_put_stid:
>         mutex_unlock(&ls->ls_mutex);
>         nfs4_put_stid(&ls->ls_stid);
> out:
>         return nfserr;
> }
> 
> So on error we never do anything with the lgp and the memory would
> be dropped.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] nfsd: Provide a config option for flex file layouts
  2016-05-25 18:19     ` Thomas Haynes
@ 2016-05-25 18:21       ` J. Bruce Fields
  0 siblings, 0 replies; 22+ messages in thread
From: J. Bruce Fields @ 2016-05-25 18:21 UTC (permalink / raw)
  To: Thomas Haynes; +Cc: hch, Linux NFS Mailing list

On Wed, May 25, 2016 at 06:19:20PM +0000, Thomas Haynes wrote:
> 
> > On May 25, 2016, at 8:09 AM, Christoph Hellwig <hch@lst.de> wrote:
> > 
> >> +config NFSD_FLEXFILELAYOUT
> >> +	bool "NFSv4.1 server support for pNFS Flex File layouts"
> >> +	depends on NFSD_V4
> >> +	select NFSD_PNFS
> >> +	help
> >> +	  This option enables support for the exporting pNFS Flex File
> >> +	  layouts in the kernel's NFS server. The pNFS Flex File  layout
> >> +	  enables NFS clients to directly perform I/O to NFSv3 devices
> >> +	  accesible to both the server and the clients.  See
> >> +	  draft-ietf-nfsv4-flex-files for more details.
> > 
> > How about a bit more of a warning that this is just a toy demo server?
> > 
> 
> Add:
> 
>           Warning, this server implements the bare minimum functionality for
>           to be a flex file server - it is more for testing the client
>           than for a production server.

I'd leave out the "more", and just say "it is for testing the client,
not for use in production".

Which makes me wonder whether it's even worth merging.

But it's very small and self-contained, so I'm inclined to go ahead,
pending other review.

--b.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] nfsd: Add a super simple flex file server
  2016-05-25 17:42     ` J. Bruce Fields
@ 2016-05-25 21:57       ` Jeff Layton
  2016-05-26 13:18         ` J. Bruce Fields
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff Layton @ 2016-05-25 21:57 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Tom Haynes, Linux NFS Mailing list, Christoph Hellwig

On Wed, 2016-05-25 at 13:42 -0400, J. Bruce Fields wrote:
> On Wed, May 25, 2016 at 08:30:44AM -0400, Jeff Layton wrote:
> > 
> > Maybe it's time to start thinking about how to support multiple layout
> > types per export?
> It looks like nobody would want this flex file code in production.  The
> only users will be testers and developers.
> 
> And the scsi layout is really just a replacement for the block layout,
> nobody should be supporting both of those at once either.
> 

Well...unless you have a mix of clients that just support block and
some that support scsi. Is that plausible?

> Would it be too much of a burden just to make flexfiles and developers
> build their own kernels?
>
> > It doesn't look like it would be that hard. I think
> > we could convert ex_layout_type into a bitmap that shows which types
> > are supported.
> ex_layout_type is only used internally, the only external interface is
> an export flag, so we'd need some new interface.
> 

I was thinking that with the "pnfs" export option you'd just enable any
layouts that the fs supports. So here, you could theoretically allow
nfsd to offer up block, scsi and flexfiles layouts given the right fs,
and leave the decision of the layout type to actually use up to the
client.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] nfsd: Add a super simple flex file server
  2016-05-25 15:15   ` Christoph Hellwig
@ 2016-05-26  5:37     ` Thomas Haynes
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Haynes @ 2016-05-26  5:37 UTC (permalink / raw)
  To: hch; +Cc: J. Bruce Fields, Linux NFS Mailing list


> On May 25, 2016, at 8:15 AM, Christoph Hellwig <hch@lst.de> wrote:
> 
> 
>> +	 * effectively be WRITE only.
>> +	 */
>> +	fl->flags = FF_FLAGS_NO_LAYOUTCOMMIT | FF_FLAGS_NO_IO_THRU_MDS |
>> +		    FF_FLAGS_NO_READ_IO;
>> +
>> +	fl->uid = inode->i_uid;
>> +	fl->gid = inode->i_gid;
> 
> Maybe I need to actually read the latest draft, but what's the story
> about these on the wire uids/gids?
> 

Since NFSv3 does not grok stateids, this allows us to control access to
the file.

The mode is adjusted such that the owner has read/write and
the group has read access:

loghyr:~ loghyr$ ls -la foo
-rw-r-----  1 loghyr  staff  0 May 25 22:27 foo

When the mds decides to fence off access for the IOMODE_RW
segment, it changes the uid (monotonically increasing reduces the
chance of resetting to some older value). When it wants to fence
off the IOMODE_READ segment, it changes the gid.

So the code above, should really be something like:

if (seg->iomode == IOMODE_READ)
        fl->uid = inode->i_uid + 11;
else
        fl->uid = inode->i_uid;

        fl->gid = inode->i_gid;

This prevents some client from using the IOMODE_READ segment
to do writes. (I think Jeff just fixed that recently in the client.)

As this patchset neither has fencing nor a remote DS, the synthetic
uid/gid works because the file modes have already determined if
access is to be granted. There are “issues” in that the mode bits
may not be 0640.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] nfsd: Add a super simple flex file server
  2016-05-25 21:57       ` Jeff Layton
@ 2016-05-26 13:18         ` J. Bruce Fields
  0 siblings, 0 replies; 22+ messages in thread
From: J. Bruce Fields @ 2016-05-26 13:18 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Tom Haynes, Linux NFS Mailing list, Christoph Hellwig

On Wed, May 25, 2016 at 05:57:29PM -0400, Jeff Layton wrote:
> On Wed, 2016-05-25 at 13:42 -0400, J. Bruce Fields wrote:
> > On Wed, May 25, 2016 at 08:30:44AM -0400, Jeff Layton wrote:
> > > 
> > > Maybe it's time to start thinking about how to support multiple layout
> > > types per export?
> > It looks like nobody would want this flex file code in production.  The
> > only users will be testers and developers.
> > 
> > And the scsi layout is really just a replacement for the block layout,
> > nobody should be supporting both of those at once either.
> > 
> 
> Well...unless you have a mix of clients that just support block and
> some that support scsi. Is that plausible?

Maybe so.  I don't think it would be useful to support, though.

(I'd rather people skipped straight to scsi.  And block never got much
use, so I don't think that should be hard.  But if you're really stuck
with some block clients, I suspect you may as well use block for all of
them--the scsi clients could probably do block layout too, and I think
you only get all the advantages of the scsi layout if all your clients
are using it.)

> > Would it be too much of a burden just to make flexfiles and developers
> > build their own kernels?
> >
> > > It doesn't look like it would be that hard. I think
> > > we could convert ex_layout_type into a bitmap that shows which types
> > > are supported.
> > ex_layout_type is only used internally, the only external interface is
> > an export flag, so we'd need some new interface.
> > 
> 
> I was thinking that with the "pnfs" export option you'd just enable any
> layouts that the fs supports. So here, you could theoretically allow
> nfsd to offer up block, scsi and flexfiles layouts given the right fs,
> and leave the decision of the layout type to actually use up to the
> client.

OK.

In this particular case that seems less interesting than just the
ability to turn flexfiles on and off from user space, so testers can
enable it without having to build a new kernel.

Hopefully we can figure out how to make this useful in production some
day, and then maybe the multiple-layout support becomes more
interesting.

--b.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-05-26 13:18 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-25  5:09 [PATCH 0/4] Super simple flex file server Tom Haynes
2016-05-25  5:09 ` [PATCH 1/4] nfsd: flex file device id encoding will need the server addres Tom Haynes
2016-05-25 11:49   ` Jeff Layton
2016-05-25 15:08   ` Christoph Hellwig
2016-05-25  5:09 ` [PATCH 2/4] nfsd: Can leak pnfs_block_extent on error Tom Haynes
2016-05-25 11:50   ` Jeff Layton
2016-05-25 15:07   ` Christoph Hellwig
2016-05-25 18:12     ` Thomas Haynes
2016-05-25 18:20       ` J. Bruce Fields
2016-05-25  5:09 ` [PATCH 3/4] nfsd: Add a super simple flex file server Tom Haynes
2016-05-25 12:00   ` Jeff Layton
2016-05-25 12:30   ` Jeff Layton
2016-05-25 14:41     ` Thomas Haynes
2016-05-25 17:42     ` J. Bruce Fields
2016-05-25 21:57       ` Jeff Layton
2016-05-26 13:18         ` J. Bruce Fields
2016-05-25 15:15   ` Christoph Hellwig
2016-05-26  5:37     ` Thomas Haynes
2016-05-25  5:09 ` [PATCH 4/4] nfsd: Provide a config option for flex file layouts Tom Haynes
2016-05-25 15:09   ` Christoph Hellwig
2016-05-25 18:19     ` Thomas Haynes
2016-05-25 18:21       ` J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.