linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/25] add block layout driver to pnfs client
@ 2011-07-27 18:40 Jim Rees
  2011-07-27 18:40 ` [PATCH v3 01/25] pnfs: GETDEVICELIST Jim Rees
                   ` (24 more replies)
  0 siblings, 25 replies; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

This patch set adds a block layout driver to the pNFS client.  It passes
Connectathon tests and is bisectable.  It requires an updated version of
nfs-utils, and patches for that have been sent separately to the nfs-utils
maintainer.

This patch set is also available on the for-trond branch of the git repo at
git://citi.umich.edu/projects/linux-pnfs-blk.git

This is version 3 of this patch set.  Changes since version 2:

- Rebase to trond/nfs-for-next (commit ed1e6211a0).
- Fix Fred's and Benny's email addresses.
- Use FIXME to flag code that needs more work.
- Remove obsolete comments.
- Minor patch re-orgs per reviewer comments.

Changes since version 1:

NFS41: Drop lseg ref before fallthru to MDS
SQUASHME: pnfsblock: get rid of vmap and deviceid->area structure
SQUASHME: pnfsblock: define module alias
SQUASHME: pnfsblock: bl_find_get_extent optimization: mv break clause to end of loop
SQUASHME: pnfsblock: test debug bit once for multiple dprintks
SQUASHME: pnfsblock: typo
SQUASHME: pnfsblock: get rid of unused leftovers from device mapping removal

Andy Adamson (2):
  pnfs: GETDEVICELIST
  pnfs: cleanup_layoutcommit

Benny Halevy (2):
  pnfs: add set-clear layoutdriver interface
  pnfsblock: use pageio_ops api

Fred Isaman (15):
  pnfs: ask for layout_blksize and save it in nfs_server
  pnfsblock: add blocklayout Kconfig option, Makefile, and stubs
  pnfsblock: basic extent code
  pnfsblock: lseg alloc and free
  pnfsblock: merge extents
  pnfsblock: call and parse getdevicelist
  pnfsblock: xdr decode pnfs_block_layout4
  pnfsblock: bl_find_get_extent
  pnfsblock: add extent manipulation functions
  pnfsblock: merge rw extents
  pnfsblock: encode_layoutcommit
  pnfsblock: cleanup_layoutcommit
  pnfsblock: bl_read_pagelist
  pnfsblock: bl_write_pagelist
  pnfsblock: note written INVAL areas for layoutcommit

Jim Rees (2):
  pnfsblock: add device operations
  pnfsblock: remove device operations

Peng Tao (4):
  NFS41: Let layoutcommit handle multiple segments
  NFS41: save layoutcommit cred after first successful layoutget
  pnfsblock: write_pagelist handle zero invalid extents
  NFS41: Drop lseg ref before fallthru to MDS

 fs/nfs/Kconfig                      |    8 +-
 fs/nfs/Makefile                     |    1 +
 fs/nfs/blocklayout/Makefile         |    5 +
 fs/nfs/blocklayout/blocklayout.c    | 1043 +++++++++++++++++++++++++++++++++++
 fs/nfs/blocklayout/blocklayout.h    |  208 +++++++
 fs/nfs/blocklayout/blocklayoutdev.c |  410 ++++++++++++++
 fs/nfs/blocklayout/blocklayoutdm.c  |  111 ++++
 fs/nfs/blocklayout/extents.c        |  943 +++++++++++++++++++++++++++++++
 fs/nfs/client.c                     |   11 +-
 fs/nfs/nfs4_fs.h                    |    2 +-
 fs/nfs/nfs4filelayout.c             |    2 +-
 fs/nfs/nfs4proc.c                   |   62 ++-
 fs/nfs/nfs4xdr.c                    |  230 +++++++-
 fs/nfs/pnfs.c                       |   86 ++-
 fs/nfs/pnfs.h                       |   30 +-
 include/linux/nfs.h                 |    2 +
 include/linux/nfs4.h                |    1 +
 include/linux/nfs_fs_sb.h           |    4 +-
 include/linux/nfs_xdr.h             |   17 +-
 19 files changed, 3113 insertions(+), 63 deletions(-)
 create mode 100644 fs/nfs/blocklayout/Makefile
 create mode 100644 fs/nfs/blocklayout/blocklayout.c
 create mode 100644 fs/nfs/blocklayout/blocklayout.h
 create mode 100644 fs/nfs/blocklayout/blocklayoutdev.c
 create mode 100644 fs/nfs/blocklayout/blocklayoutdm.c
 create mode 100644 fs/nfs/blocklayout/extents.c

-- 
1.7.4.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v3 01/25] pnfs: GETDEVICELIST
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 19:33   ` Trond Myklebust
  2011-07-27 18:40 ` [PATCH v3 02/25] pnfs: add set-clear layoutdriver interface Jim Rees
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Andy Adamson <andros@netapp.com>

The block driver uses GETDEVICELIST

Signed-off-by: Andy Adamson <andros@netapp.com>
[pass struct nfs_server * to getdevicelist]
[get machince creds for getdevicelist]
[fix getdevicelist decode sizing]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
---
 fs/nfs/nfs4proc.c       |   48 ++++++++++++++++++
 fs/nfs/nfs4xdr.c        |  128 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/pnfs.h           |   12 ++++
 include/linux/nfs4.h    |    1 +
 include/linux/nfs_xdr.h |   11 ++++
 5 files changed, 200 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 079614d..ebb6f1a 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5834,6 +5834,54 @@ int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp)
 	return status;
 }
 
+/*
+ * Retrieve the list of Data Server devices from the MDS.
+ */
+static int _nfs4_getdevicelist(struct nfs_server *server,
+				    const struct nfs_fh *fh,
+				    struct pnfs_devicelist *devlist)
+{
+	struct nfs4_getdevicelist_args args = {
+		.fh = fh,
+		.layoutclass = server->pnfs_curr_ld->id,
+	};
+	struct nfs4_getdevicelist_res res = {
+		.devlist = devlist,
+	};
+	struct rpc_message msg = {
+		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_GETDEVICELIST],
+		.rpc_argp = &args,
+		.rpc_resp = &res,
+	};
+	int status;
+
+	dprintk("--> %s\n", __func__);
+	status = nfs4_call_sync(server->client, server, &msg, &args.seq_args,
+				&res.seq_res, 0);
+	dprintk("<-- %s status=%d\n", __func__, status);
+	return status;
+}
+
+int nfs4_proc_getdevicelist(struct nfs_server *server,
+			    const struct nfs_fh *fh,
+			    struct pnfs_devicelist *devlist)
+{
+	struct nfs4_exception exception = { };
+	int err;
+
+	do {
+		err = nfs4_handle_exception(server,
+				_nfs4_getdevicelist(server, fh, devlist),
+				&exception);
+	} while (exception.retry);
+
+	dprintk("%s: err=%d, num_devs=%u\n", __func__,
+		err, devlist->num_devs);
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(nfs4_proc_getdevicelist);
+
 static int
 _nfs4_proc_getdeviceinfo(struct nfs_server *server, struct pnfs_device *pdev)
 {
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index c191a9b..a82dd40 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -314,6 +314,17 @@ static int nfs4_stat_to_errno(int);
 				XDR_QUADLEN(NFS4_MAX_SESSIONID_LEN) + 5)
 #define encode_reclaim_complete_maxsz	(op_encode_hdr_maxsz + 4)
 #define decode_reclaim_complete_maxsz	(op_decode_hdr_maxsz + 4)
+#define encode_getdevicelist_maxsz (op_encode_hdr_maxsz + 4 + \
+				encode_verifier_maxsz)
+#define decode_getdevicelist_maxsz (op_decode_hdr_maxsz + \
+				2 /* nfs_cookie4 gdlr_cookie */ + \
+				decode_verifier_maxsz \
+				  /* verifier4 gdlr_verifier */ + \
+				1 /* gdlr_deviceid_list count */ + \
+				XDR_QUADLEN(NFS4_PNFS_GETDEVLIST_MAXNUM * \
+					    NFS4_DEVICEID4_SIZE) \
+				  /* gdlr_deviceid_list */ + \
+				1 /* bool gdlr_eof */)
 #define encode_getdeviceinfo_maxsz (op_encode_hdr_maxsz + 4 + \
 				XDR_QUADLEN(NFS4_DEVICEID4_SIZE))
 #define decode_getdeviceinfo_maxsz (op_decode_hdr_maxsz + \
@@ -748,6 +759,14 @@ static int nfs4_stat_to_errno(int);
 #define NFS4_dec_reclaim_complete_sz	(compound_decode_hdr_maxsz + \
 					 decode_sequence_maxsz + \
 					 decode_reclaim_complete_maxsz)
+#define NFS4_enc_getdevicelist_sz (compound_encode_hdr_maxsz + \
+				encode_sequence_maxsz + \
+				encode_putfh_maxsz + \
+				encode_getdevicelist_maxsz)
+#define NFS4_dec_getdevicelist_sz (compound_decode_hdr_maxsz + \
+				decode_sequence_maxsz + \
+				decode_putfh_maxsz + \
+				decode_getdevicelist_maxsz)
 #define NFS4_enc_getdeviceinfo_sz (compound_encode_hdr_maxsz +    \
 				encode_sequence_maxsz +\
 				encode_getdeviceinfo_maxsz)
@@ -1855,6 +1874,26 @@ static void encode_sequence(struct xdr_stream *xdr,
 
 #ifdef CONFIG_NFS_V4_1
 static void
+encode_getdevicelist(struct xdr_stream *xdr,
+		     const struct nfs4_getdevicelist_args *args,
+		     struct compound_hdr *hdr)
+{
+	__be32 *p;
+	nfs4_verifier dummy = {
+		.data = "dummmmmy",
+	};
+
+	p = reserve_space(xdr, 20);
+	*p++ = cpu_to_be32(OP_GETDEVICELIST);
+	*p++ = cpu_to_be32(args->layoutclass);
+	*p++ = cpu_to_be32(NFS4_PNFS_GETDEVLIST_MAXNUM);
+	xdr_encode_hyper(p, 0ULL);                          /* cookie */
+	encode_nfs4_verifier(xdr, &dummy);
+	hdr->nops++;
+	hdr->replen += decode_getdevicelist_maxsz;
+}
+
+static void
 encode_getdeviceinfo(struct xdr_stream *xdr,
 		     const struct nfs4_getdeviceinfo_args *args,
 		     struct compound_hdr *hdr)
@@ -2775,6 +2814,24 @@ static void nfs4_xdr_enc_reclaim_complete(struct rpc_rqst *req,
 }
 
 /*
+ * Encode GETDEVICELIST request
+ */
+static void nfs4_xdr_enc_getdevicelist(struct rpc_rqst *req,
+				       struct xdr_stream *xdr,
+				       struct nfs4_getdevicelist_args *args)
+{
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
+	};
+
+	encode_compound_hdr(xdr, req, &hdr);
+	encode_sequence(xdr, &args->seq_args, &hdr);
+	encode_putfh(xdr, args->fh, &hdr);
+	encode_getdevicelist(xdr, args, &hdr);
+	encode_nops(&hdr);
+}
+
+/*
  * Encode GETDEVICEINFO request
  */
 static void nfs4_xdr_enc_getdeviceinfo(struct rpc_rqst *req,
@@ -5268,6 +5325,50 @@ out_overflow:
 }
 
 #if defined(CONFIG_NFS_V4_1)
+/*
+ * TODO: Need to handle case when EOF != true;
+ */
+static int decode_getdevicelist(struct xdr_stream *xdr,
+				struct pnfs_devicelist *res)
+{
+	__be32 *p;
+	int status, i;
+	struct nfs_writeverf verftemp;
+
+	status = decode_op_hdr(xdr, OP_GETDEVICELIST);
+	if (status)
+		return status;
+
+	p = xdr_inline_decode(xdr, 8 + 8 + 4);
+	if (unlikely(!p))
+		goto out_overflow;
+
+	/* TODO: Skip cookie for now */
+	p += 2;
+
+	/* Read verifier */
+	p = xdr_decode_opaque_fixed(p, verftemp.verifier, 8);
+
+	res->num_devs = be32_to_cpup(p);
+
+	dprintk("%s: num_dev %d\n", __func__, res->num_devs);
+
+	if (res->num_devs > NFS4_PNFS_GETDEVLIST_MAXNUM)
+		return -NFS4ERR_REP_TOO_BIG;
+
+	p = xdr_inline_decode(xdr,
+			      res->num_devs * NFS4_DEVICEID4_SIZE + 4);
+	if (unlikely(!p))
+		goto out_overflow;
+	for (i = 0; i < res->num_devs; i++)
+		p = xdr_decode_opaque_fixed(p, res->dev_id[i].data,
+					    NFS4_DEVICEID4_SIZE);
+	res->eof = be32_to_cpup(p);
+	return 0;
+out_overflow:
+	print_overflow_msg(__func__, xdr);
+	return -EIO;
+}
 
 static int decode_getdeviceinfo(struct xdr_stream *xdr,
 				struct pnfs_device *pdev)
@@ -6542,6 +6643,32 @@ static int nfs4_xdr_dec_reclaim_complete(struct rpc_rqst *rqstp,
 }
 
 /*
+ * Decode GETDEVICELIST response
+ */
+static int nfs4_xdr_dec_getdevicelist(struct rpc_rqst *rqstp,
+				      struct xdr_stream *xdr,
+				      struct nfs4_getdevicelist_res *res)
+{
+	struct compound_hdr hdr;
+	int status;
+
+	dprintk("encoding getdevicelist!\n");
+
+	status = decode_compound_hdr(xdr, &hdr);
+	if (status != 0)
+		goto out;
+	status = decode_sequence(xdr, &res->seq_res, rqstp);
+	if (status != 0)
+		goto out;
+	status = decode_putfh(xdr);
+	if (status != 0)
+		goto out;
+	status = decode_getdevicelist(xdr, res->devlist);
+out:
+	return status;
+}
+
+/*
  * Decode GETDEVINFO response
  */
 static int nfs4_xdr_dec_getdeviceinfo(struct rpc_rqst *rqstp,
@@ -6902,6 +7029,7 @@ struct rpc_procinfo	nfs4_procedures[] = {
 	PROC(GET_LEASE_TIME,	enc_get_lease_time,	dec_get_lease_time),
 	PROC(RECLAIM_COMPLETE,	enc_reclaim_complete,	dec_reclaim_complete),
 	PROC(GETDEVICEINFO,	enc_getdeviceinfo,	dec_getdeviceinfo),
+	PROC(GETDEVICELIST,	enc_getdevicelist,	dec_getdevicelist),
 	PROC(LAYOUTGET,		enc_layoutget,		dec_layoutget),
 	PROC(LAYOUTCOMMIT,	enc_layoutcommit,	dec_layoutcommit),
 	PROC(LAYOUTRETURN,	enc_layoutreturn,	dec_layoutreturn),
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 078670d..ffea314 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -133,14 +133,26 @@ struct pnfs_device {
 	unsigned int  layout_type;
 	unsigned int  mincount;
 	struct page **pages;
+	void          *area;
 	unsigned int  pgbase;
 	unsigned int  pglen;
 };
 
+#define NFS4_PNFS_GETDEVLIST_MAXNUM 16
+
+struct pnfs_devicelist {
+	unsigned int		eof;
+	unsigned int		num_devs;
+	struct nfs4_deviceid	dev_id[NFS4_PNFS_GETDEVLIST_MAXNUM];
+};
+
 extern int pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *);
 extern void pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *);
 
 /* nfs4proc.c */
+extern int nfs4_proc_getdevicelist(struct nfs_server *server,
+				   const struct nfs_fh *fh,
+				   struct pnfs_devicelist *devlist);
 extern int nfs4_proc_getdeviceinfo(struct nfs_server *server,
 				   struct pnfs_device *dev);
 extern int nfs4_proc_layoutget(struct nfs4_layoutget *lgp);
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index a3c4bc8..76f99e8 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -566,6 +566,7 @@ enum {
 	NFSPROC4_CLNT_SECINFO_NO_NAME,
 	NFSPROC4_CLNT_TEST_STATEID,
 	NFSPROC4_CLNT_FREE_STATEID,
+	NFSPROC4_CLNT_GETDEVICELIST,
 };
 
 /* nfs41 types */
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 5b11595..a07b682 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -235,6 +235,17 @@ struct nfs4_layoutget {
 	gfp_t gfp_flags;
 };
 
+struct nfs4_getdevicelist_args {
+	const struct nfs_fh *fh;
+	u32 layoutclass;
+	struct nfs4_sequence_args seq_args;
+};
+
+struct nfs4_getdevicelist_res {
+	struct pnfs_devicelist *devlist;
+	struct nfs4_sequence_res seq_res;
+};
+
 struct nfs4_getdeviceinfo_args {
 	struct pnfs_device *pdev;
 	struct nfs4_sequence_args seq_args;
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 02/25] pnfs: add set-clear layoutdriver interface
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
  2011-07-27 18:40 ` [PATCH v3 01/25] pnfs: GETDEVICELIST Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 19:36   ` Trond Myklebust
  2011-07-27 18:40 ` [PATCH v3 03/25] NFS41: Let layoutcommit handle multiple segments Jim Rees
                   ` (22 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Benny Halevy <bhalevy@panasas.com>

To allow layout driver to issue getdevicelist at mount time, and clean up
at umount time.

[fixup non NFS_V4_1 set_pnfs_layoutdriver definition]
[pnfs: pass mntfh down the init_pnfs path]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
---
 fs/nfs/client.c |    8 +++++---
 fs/nfs/pnfs.c   |   16 ++++++++++++++--
 fs/nfs/pnfs.h   |    8 ++++++--
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 19ea7d9..a9b1848 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -904,7 +904,9 @@ error:
 /*
  * Load up the server record from information gained in an fsinfo record
  */
-static void nfs_server_set_fsinfo(struct nfs_server *server, struct nfs_fsinfo *fsinfo)
+static void nfs_server_set_fsinfo(struct nfs_server *server,
+				  struct nfs_fh *mntfh,
+				  struct nfs_fsinfo *fsinfo)
 {
 	unsigned long max_rpc_payload;
 
@@ -934,7 +936,7 @@ static void nfs_server_set_fsinfo(struct nfs_server *server, struct nfs_fsinfo *
 	if (server->wsize > NFS_MAX_FILE_IO_SIZE)
 		server->wsize = NFS_MAX_FILE_IO_SIZE;
 	server->wpages = (server->wsize + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-	set_pnfs_layoutdriver(server, fsinfo->layouttype);
+	set_pnfs_layoutdriver(server, mntfh, fsinfo->layouttype);
 
 	server->wtmult = nfs_block_bits(fsinfo->wtmult, NULL);
 
@@ -980,7 +982,7 @@ static int nfs_probe_fsinfo(struct nfs_server *server, struct nfs_fh *mntfh, str
 	if (error < 0)
 		goto out_error;
 
-	nfs_server_set_fsinfo(server, &fsinfo);
+	nfs_server_set_fsinfo(server, mntfh, &fsinfo);
 
 	/* Get some general file system info */
 	if (server->namelen == 0) {
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 38e5508..8e72724 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -76,8 +76,11 @@ find_pnfs_driver(u32 id)
 void
 unset_pnfs_layoutdriver(struct nfs_server *nfss)
 {
-	if (nfss->pnfs_curr_ld)
+	if (nfss->pnfs_curr_ld) {
+		if (nfss->pnfs_curr_ld->clear_layoutdriver)
+			nfss->pnfs_curr_ld->clear_layoutdriver(nfss);
 		module_put(nfss->pnfs_curr_ld->owner);
+	}
 	nfss->pnfs_curr_ld = NULL;
 }
 
@@ -88,7 +91,8 @@ unset_pnfs_layoutdriver(struct nfs_server *nfss)
  * @id layout type. Zero (illegal layout type) indicates pNFS not in use.
  */
 void
-set_pnfs_layoutdriver(struct nfs_server *server, u32 id)
+set_pnfs_layoutdriver(struct nfs_server *server, const struct nfs_fh *mntfh,
+		      u32 id)
 {
 	struct pnfs_layoutdriver_type *ld_type = NULL;
 
@@ -115,6 +119,14 @@ set_pnfs_layoutdriver(struct nfs_server *server, u32 id)
 		goto out_no_driver;
 	}
 	server->pnfs_curr_ld = ld_type;
+	if (ld_type->set_layoutdriver
+	    && ld_type->set_layoutdriver(server, mntfh)) {
+		printk(KERN_ERR
+		       "%s: Error initializing mount point for layout driver %u.\n",
+		       __func__, id);
+		module_put(ld_type->owner);
+		goto out_no_driver;
+	}
 
 	dprintk("%s: pNFS module for %u set\n", __func__, id);
 	return;
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index ffea314..23d8267 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -80,6 +80,9 @@ struct pnfs_layoutdriver_type {
 	struct module *owner;
 	unsigned flags;
 
+	int (*set_layoutdriver) (struct nfs_server *, const struct nfs_fh *);
+	int (*clear_layoutdriver) (struct nfs_server *);
+
 	struct pnfs_layout_hdr * (*alloc_layout_hdr) (struct inode *inode, gfp_t gfp_flags);
 	void (*free_layout_hdr) (struct pnfs_layout_hdr *);
 
@@ -165,7 +168,7 @@ void put_lseg(struct pnfs_layout_segment *lseg);
 bool pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *);
 bool pnfs_pageio_init_write(struct nfs_pageio_descriptor *, struct inode *, int);
 
-void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
+void set_pnfs_layoutdriver(struct nfs_server *, const struct nfs_fh *, u32);
 void unset_pnfs_layoutdriver(struct nfs_server *);
 void pnfs_generic_pg_init_read(struct nfs_pageio_descriptor *, struct nfs_page *);
 int pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc);
@@ -372,7 +375,8 @@ pnfs_roc_drain(struct inode *ino, u32 *barrier)
 	return false;
 }
 
-static inline void set_pnfs_layoutdriver(struct nfs_server *s, u32 id)
+static inline void set_pnfs_layoutdriver(struct nfs_server *s,
+					 const struct nfs_fh *mntfh, u32 id);
 {
 }
 
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 03/25] NFS41: Let layoutcommit handle multiple segments
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
  2011-07-27 18:40 ` [PATCH v3 01/25] pnfs: GETDEVICELIST Jim Rees
  2011-07-27 18:40 ` [PATCH v3 02/25] pnfs: add set-clear layoutdriver interface Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 20:05   ` Boaz Harrosh
  2011-07-27 18:40 ` [PATCH v3 04/25] NFS41: save layoutcommit cred after first successful layoutget Jim Rees
                   ` (21 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Peng Tao <bergwolf@gmail.com>

Some layout drivers like block will have multiple segments.  Generic code
should be able to handle it.  Layoutcommit takes a list of segments and last
write offset is saved at inode level.

Signed-off-by: Peng Tao <peng_tao@emc.com>
---
 fs/nfs/nfs4filelayout.c |    2 +-
 fs/nfs/nfs4proc.c       |    8 ++++++-
 fs/nfs/nfs4xdr.c        |    2 +-
 fs/nfs/pnfs.c           |   50 ++++++++++++++++++++++++----------------------
 fs/nfs/pnfs.h           |    5 +++-
 include/linux/nfs_xdr.h |    2 +-
 6 files changed, 40 insertions(+), 29 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index be93a62..e8915d4 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -170,7 +170,7 @@ filelayout_set_layoutcommit(struct nfs_write_data *wdata)
 
 	pnfs_set_layoutcommit(wdata);
 	dprintk("%s ionde %lu pls_end_pos %lu\n", __func__, wdata->inode->i_ino,
-		(unsigned long) wdata->lseg->pls_end_pos);
+		(unsigned long) NFS_I(wdata->inode)->layout->plh_lwb);
 }
 
 /*
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index ebb6f1a..af32d3d 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5960,9 +5960,15 @@ nfs4_layoutcommit_done(struct rpc_task *task, void *calldata)
 static void nfs4_layoutcommit_release(void *calldata)
 {
 	struct nfs4_layoutcommit_data *data = calldata;
+	struct pnfs_layout_segment *lseg, *tmp;
 
 	/* Matched by references in pnfs_set_layoutcommit */
-	put_lseg(data->lseg);
+	list_for_each_entry_safe(lseg, tmp, &data->lseg_list, pls_lc_list) {
+		list_del_init(&lseg->pls_lc_list);
+		if (test_and_clear_bit(NFS_LSEG_LAYOUTCOMMIT,
+				       &lseg->pls_flags))
+			put_lseg(lseg);
+	}
 	put_rpccred(data->cred);
 	kfree(data);
 }
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index a82dd40..c149cbe 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -1955,7 +1955,7 @@ encode_layoutcommit(struct xdr_stream *xdr,
 	*p++ = cpu_to_be32(OP_LAYOUTCOMMIT);
 	/* Only whole file layouts */
 	p = xdr_encode_hyper(p, 0); /* offset */
-	p = xdr_encode_hyper(p, NFS4_MAX_UINT64); /* length */
+	p = xdr_encode_hyper(p, args->lastbytewritten + 1);	/* length */
 	*p++ = cpu_to_be32(0); /* reclaim */
 	p = xdr_encode_opaque_fixed(p, args->stateid.data, NFS4_STATEID_SIZE);
 	*p++ = cpu_to_be32(1); /* newoffset = TRUE */
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 8e72724..dae19dd 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -236,6 +236,7 @@ static void
 init_lseg(struct pnfs_layout_hdr *lo, struct pnfs_layout_segment *lseg)
 {
 	INIT_LIST_HEAD(&lseg->pls_list);
+	INIT_LIST_HEAD(&lseg->pls_lc_list);
 	atomic_set(&lseg->pls_refcount, 1);
 	smp_mb();
 	set_bit(NFS_LSEG_VALID, &lseg->pls_flags);
@@ -1362,16 +1363,17 @@ pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc)
 EXPORT_SYMBOL_GPL(pnfs_generic_pg_readpages);
 
 /*
- * Currently there is only one (whole file) write lseg.
+ * There can be multiple RW segments.
  */
-static struct pnfs_layout_segment *pnfs_list_write_lseg(struct inode *inode)
+static void pnfs_list_write_lseg(struct inode *inode, struct list_head *listp)
 {
-	struct pnfs_layout_segment *lseg, *rv = NULL;
+	struct pnfs_layout_segment *lseg;
 
-	list_for_each_entry(lseg, &NFS_I(inode)->layout->plh_segs, pls_list)
-		if (lseg->pls_range.iomode == IOMODE_RW)
-			rv = lseg;
-	return rv;
+	list_for_each_entry(lseg, &NFS_I(inode)->layout->plh_segs, pls_list) {
+		if (lseg->pls_range.iomode == IOMODE_RW &&
+		    test_bit(NFS_LSEG_LAYOUTCOMMIT, &lseg->pls_flags))
+			list_add(&lseg->pls_lc_list, listp);
+	}
 }
 
 void
@@ -1383,17 +1385,21 @@ pnfs_set_layoutcommit(struct nfs_write_data *wdata)
 
 	spin_lock(&nfsi->vfs_inode.i_lock);
 	if (!test_and_set_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags)) {
-		/* references matched in nfs4_layoutcommit_release */
-		get_lseg(wdata->lseg);
-		wdata->lseg->pls_lc_cred =
-			get_rpccred(wdata->args.context->state->owner->so_cred);
 		mark_as_dirty = true;
+		nfsi->layout->plh_lc_cred =
+			get_rpccred(wdata->args.context->state->owner->so_cred);
 		dprintk("%s: Set layoutcommit for inode %lu ",
 			__func__, wdata->inode->i_ino);
 	}
-	if (end_pos > wdata->lseg->pls_end_pos)
-		wdata->lseg->pls_end_pos = end_pos;
+	if (!test_and_set_bit(NFS_LSEG_LAYOUTCOMMIT, &wdata->lseg->pls_flags)) {
+		/* references matched in nfs4_layoutcommit_release */
+		get_lseg(wdata->lseg);
+	}
+	if (end_pos > nfsi->layout->plh_lwb)
+		nfsi->layout->plh_lwb = end_pos;
 	spin_unlock(&nfsi->vfs_inode.i_lock);
+	dprintk("%s: lseg %p end_pos %llu\n",
+		__func__, wdata->lseg, nfsi->layout->plh_lwb);
 
 	/* if pnfs_layoutcommit_inode() runs between inode locks, the next one
 	 * will be a noop because NFS_INO_LAYOUTCOMMIT will not be set */
@@ -1415,7 +1421,6 @@ pnfs_layoutcommit_inode(struct inode *inode, bool sync)
 {
 	struct nfs4_layoutcommit_data *data;
 	struct nfs_inode *nfsi = NFS_I(inode);
-	struct pnfs_layout_segment *lseg;
 	struct rpc_cred *cred;
 	loff_t end_pos;
 	int status = 0;
@@ -1433,29 +1438,26 @@ pnfs_layoutcommit_inode(struct inode *inode, bool sync)
 		goto out;
 	}
 
+	INIT_LIST_HEAD(&data->lseg_list);
 	spin_lock(&inode->i_lock);
 	if (!test_and_clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags)) {
 		spin_unlock(&inode->i_lock);
 		kfree(data);
 		goto out;
 	}
-	/*
-	 * Currently only one (whole file) write lseg which is referenced
-	 * in pnfs_set_layoutcommit and will be found.
-	 */
-	lseg = pnfs_list_write_lseg(inode);
 
-	end_pos = lseg->pls_end_pos;
-	cred = lseg->pls_lc_cred;
-	lseg->pls_end_pos = 0;
-	lseg->pls_lc_cred = NULL;
+	pnfs_list_write_lseg(inode, &data->lseg_list);
+
+	end_pos = nfsi->layout->plh_lwb;
+	cred = nfsi->layout->plh_lc_cred;
+	nfsi->layout->plh_lwb = 0;
+	nfsi->layout->plh_lc_cred = NULL;
 
 	memcpy(&data->args.stateid.data, nfsi->layout->plh_stateid.data,
 		sizeof(nfsi->layout->plh_stateid.data));
 	spin_unlock(&inode->i_lock);
 
 	data->args.inode = inode;
-	data->lseg = lseg;
 	data->cred = cred;
 	nfs_fattr_init(&data->fattr);
 	data->args.bitmask = NFS_SERVER(inode)->cache_consistency_bitmask;
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 23d8267..5f1b532 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -36,16 +36,17 @@
 enum {
 	NFS_LSEG_VALID = 0,	/* cleared when lseg is recalled/returned */
 	NFS_LSEG_ROC,		/* roc bit received from server */
+	NFS_LSEG_LAYOUTCOMMIT,	/* layoutcommit bit set for layoutcommit */
 };
 
 struct pnfs_layout_segment {
 	struct list_head pls_list;
+	struct list_head pls_lc_list;
 	struct pnfs_layout_range pls_range;
 	atomic_t pls_refcount;
 	unsigned long pls_flags;
 	struct pnfs_layout_hdr *pls_layout;
 	struct rpc_cred	*pls_lc_cred; /* LAYOUTCOMMIT credential */
-	loff_t pls_end_pos; /* LAYOUTCOMMIT write end */
 };
 
 enum pnfs_try_status {
@@ -128,6 +129,8 @@ struct pnfs_layout_hdr {
 	unsigned long		plh_block_lgets; /* block LAYOUTGET if >0 */
 	u32			plh_barrier; /* ignore lower seqids */
 	unsigned long		plh_flags;
+	loff_t			plh_lwb; /* last write byte for layoutcommit */
+	struct rpc_cred		*plh_lc_cred; /* layoutcommit cred */
 	struct inode		*plh_inode;
 };
 
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index a07b682..21f333e 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -273,7 +273,7 @@ struct nfs4_layoutcommit_res {
 struct nfs4_layoutcommit_data {
 	struct rpc_task task;
 	struct nfs_fattr fattr;
-	struct pnfs_layout_segment *lseg;
+	struct list_head lseg_list;
 	struct rpc_cred *cred;
 	struct nfs4_layoutcommit_args args;
 	struct nfs4_layoutcommit_res res;
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 04/25] NFS41: save layoutcommit cred after first successful layoutget
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (2 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 03/25] NFS41: Let layoutcommit handle multiple segments Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 19:47   ` Trond Myklebust
  2011-07-27 18:40 ` [PATCH v3 05/25] pnfs: ask for layout_blksize and save it in nfs_server Jim Rees
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Peng Tao <bergwolf@gmail.com>

Save it in layout header instead of in every segment.

Signed-off-by: Peng Tao <peng_tao@emc.com>
---
 fs/nfs/pnfs.c |   12 ++++++------
 fs/nfs/pnfs.h |    1 -
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index dae19dd..fbebd2a 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -211,6 +211,7 @@ destroy_layout_hdr(struct pnfs_layout_hdr *lo)
 	dprintk("%s: freeing layout cache %p\n", __func__, lo);
 	BUG_ON(!list_empty(&lo->plh_layouts));
 	NFS_I(lo->plh_inode)->layout = NULL;
+	put_rpccred(lo->plh_lc_cred);
 	pnfs_free_layout_hdr(lo);
 }
 
@@ -1007,6 +1008,10 @@ pnfs_update_layout(struct inode *ino,
 		list_del_init(&lo->plh_layouts);
 		spin_unlock(&clp->cl_lock);
 	}
+	if (first) {
+		lo->plh_lc_cred =
+			get_rpccred(ctx->state->owner->so_cred);
+	}
 	atomic_dec(&lo->plh_outstanding);
 	put_layout_hdr(lo);
 out:
@@ -1386,8 +1391,6 @@ pnfs_set_layoutcommit(struct nfs_write_data *wdata)
 	spin_lock(&nfsi->vfs_inode.i_lock);
 	if (!test_and_set_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags)) {
 		mark_as_dirty = true;
-		nfsi->layout->plh_lc_cred =
-			get_rpccred(wdata->args.context->state->owner->so_cred);
 		dprintk("%s: Set layoutcommit for inode %lu ",
 			__func__, wdata->inode->i_ino);
 	}
@@ -1421,7 +1424,6 @@ pnfs_layoutcommit_inode(struct inode *inode, bool sync)
 {
 	struct nfs4_layoutcommit_data *data;
 	struct nfs_inode *nfsi = NFS_I(inode);
-	struct rpc_cred *cred;
 	loff_t end_pos;
 	int status = 0;
 
@@ -1449,16 +1451,14 @@ pnfs_layoutcommit_inode(struct inode *inode, bool sync)
 	pnfs_list_write_lseg(inode, &data->lseg_list);
 
 	end_pos = nfsi->layout->plh_lwb;
-	cred = nfsi->layout->plh_lc_cred;
 	nfsi->layout->plh_lwb = 0;
-	nfsi->layout->plh_lc_cred = NULL;
 
 	memcpy(&data->args.stateid.data, nfsi->layout->plh_stateid.data,
 		sizeof(nfsi->layout->plh_stateid.data));
 	spin_unlock(&inode->i_lock);
 
 	data->args.inode = inode;
-	data->cred = cred;
+	data->cred = get_rpccred(nfsi->layout->plh_lc_cred);
 	nfs_fattr_init(&data->fattr);
 	data->args.bitmask = NFS_SERVER(inode)->cache_consistency_bitmask;
 	data->res.fattr = &data->fattr;
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 5f1b532..bddd8b9 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -46,7 +46,6 @@ struct pnfs_layout_segment {
 	atomic_t pls_refcount;
 	unsigned long pls_flags;
 	struct pnfs_layout_hdr *pls_layout;
-	struct rpc_cred	*pls_lc_cred; /* LAYOUTCOMMIT credential */
 };
 
 enum pnfs_try_status {
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 05/25] pnfs: ask for layout_blksize and save it in nfs_server
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (3 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 04/25] NFS41: save layoutcommit cred after first successful layoutget Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 18:40 ` [PATCH v3 06/25] pnfs: cleanup_layoutcommit Jim Rees
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Fred Isaman <iisaman@citi.umich.edu>

Block layout needs it to determine IO size.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Tao Guo <glorioustao@gmail.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
---
 fs/nfs/client.c           |    1 +
 fs/nfs/nfs4_fs.h          |    2 +-
 fs/nfs/nfs4proc.c         |    5 +-
 fs/nfs/nfs4xdr.c          |   99 +++++++++++++++++++++++++++++++++++++--------
 include/linux/nfs_fs_sb.h |    3 +-
 include/linux/nfs_xdr.h   |    3 +-
 6 files changed, 91 insertions(+), 22 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index a9b1848..de00a37 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -936,6 +936,7 @@ static void nfs_server_set_fsinfo(struct nfs_server *server,
 	if (server->wsize > NFS_MAX_FILE_IO_SIZE)
 		server->wsize = NFS_MAX_FILE_IO_SIZE;
 	server->wpages = (server->wsize + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+	server->pnfs_blksize = fsinfo->blksize;
 	set_pnfs_layoutdriver(server, mntfh, fsinfo->layouttype);
 
 	server->wtmult = nfs_block_bits(fsinfo->wtmult, NULL);
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 1909ee8..1ec1a85 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -318,7 +318,7 @@ extern const struct nfs4_minor_version_ops *nfs_v4_minor_ops[];
 extern const u32 nfs4_fattr_bitmap[2];
 extern const u32 nfs4_statfs_bitmap[2];
 extern const u32 nfs4_pathconf_bitmap[2];
-extern const u32 nfs4_fsinfo_bitmap[2];
+extern const u32 nfs4_fsinfo_bitmap[3];
 extern const u32 nfs4_fs_locations_bitmap[2];
 
 /* nfs4renewd.c */
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index af32d3d..e86de79 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -140,12 +140,13 @@ const u32 nfs4_pathconf_bitmap[2] = {
 	0
 };
 
-const u32 nfs4_fsinfo_bitmap[2] = { FATTR4_WORD0_MAXFILESIZE
+const u32 nfs4_fsinfo_bitmap[3] = { FATTR4_WORD0_MAXFILESIZE
 			| FATTR4_WORD0_MAXREAD
 			| FATTR4_WORD0_MAXWRITE
 			| FATTR4_WORD0_LEASE_TIME,
 			FATTR4_WORD1_TIME_DELTA
-			| FATTR4_WORD1_FS_LAYOUT_TYPES
+			| FATTR4_WORD1_FS_LAYOUT_TYPES,
+			FATTR4_WORD2_LAYOUT_BLKSIZE
 };
 
 const u32 nfs4_fs_locations_bitmap[2] = {
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index c149cbe..f32dde9 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -113,7 +113,11 @@ static int nfs4_stat_to_errno(int);
 #define encode_restorefh_maxsz  (op_encode_hdr_maxsz)
 #define decode_restorefh_maxsz  (op_decode_hdr_maxsz)
 #define encode_fsinfo_maxsz	(encode_getattr_maxsz)
-#define decode_fsinfo_maxsz	(op_decode_hdr_maxsz + 15)
+/* The 5 accounts for the PNFS attributes, and assumes that at most three
+ * layout types will be returned.
+ */
+#define decode_fsinfo_maxsz	(op_decode_hdr_maxsz + \
+				 nfs4_fattr_bitmap_maxsz + 4 + 8 + 5)
 #define encode_renew_maxsz	(op_encode_hdr_maxsz + 3)
 #define decode_renew_maxsz	(op_decode_hdr_maxsz)
 #define encode_setclientid_maxsz \
@@ -1123,6 +1127,35 @@ static void encode_getattr_two(struct xdr_stream *xdr, uint32_t bm0, uint32_t bm
 	hdr->replen += decode_getattr_maxsz;
 }
 
+static void
+encode_getattr_three(struct xdr_stream *xdr,
+		     uint32_t bm0, uint32_t bm1, uint32_t bm2,
+		     struct compound_hdr *hdr)
+{
+	__be32 *p;
+
+	p = reserve_space(xdr, 4);
+	*p = cpu_to_be32(OP_GETATTR);
+	if (bm2) {
+		p = reserve_space(xdr, 16);
+		*p++ = cpu_to_be32(3);
+		*p++ = cpu_to_be32(bm0);
+		*p++ = cpu_to_be32(bm1);
+		*p = cpu_to_be32(bm2);
+	} else if (bm1) {
+		p = reserve_space(xdr, 12);
+		*p++ = cpu_to_be32(2);
+		*p++ = cpu_to_be32(bm0);
+		*p = cpu_to_be32(bm1);
+	} else {
+		p = reserve_space(xdr, 8);
+		*p++ = cpu_to_be32(1);
+		*p = cpu_to_be32(bm0);
+	}
+	hdr->nops++;
+	hdr->replen += decode_getattr_maxsz;
+}
+
 static void encode_getfattr(struct xdr_stream *xdr, const u32* bitmask, struct compound_hdr *hdr)
 {
 	encode_getattr_two(xdr, bitmask[0] & nfs4_fattr_bitmap[0],
@@ -1131,8 +1164,11 @@ static void encode_getfattr(struct xdr_stream *xdr, const u32* bitmask, struct c
 
 static void encode_fsinfo(struct xdr_stream *xdr, const u32* bitmask, struct compound_hdr *hdr)
 {
-	encode_getattr_two(xdr, bitmask[0] & nfs4_fsinfo_bitmap[0],
-			   bitmask[1] & nfs4_fsinfo_bitmap[1], hdr);
+	encode_getattr_three(xdr,
+			     bitmask[0] & nfs4_fsinfo_bitmap[0],
+			     bitmask[1] & nfs4_fsinfo_bitmap[1],
+			     bitmask[2] & nfs4_fsinfo_bitmap[2],
+			     hdr);
 }
 
 static void encode_fs_locations(struct xdr_stream *xdr, const u32* bitmask, struct compound_hdr *hdr)
@@ -2643,7 +2679,7 @@ static void nfs4_xdr_enc_setclientid_confirm(struct rpc_rqst *req,
 	struct compound_hdr hdr = {
 		.nops	= 0,
 	};
-	const u32 lease_bitmap[2] = { FATTR4_WORD0_LEASE_TIME, 0 };
+	const u32 lease_bitmap[3] = { FATTR4_WORD0_LEASE_TIME };
 
 	encode_compound_hdr(xdr, req, &hdr);
 	encode_setclientid_confirm(xdr, arg, &hdr);
@@ -2787,7 +2823,7 @@ static void nfs4_xdr_enc_get_lease_time(struct rpc_rqst *req,
 	struct compound_hdr hdr = {
 		.minorversion = nfs4_xdr_minorversion(&args->la_seq_args),
 	};
-	const u32 lease_bitmap[2] = { FATTR4_WORD0_LEASE_TIME, 0 };
+	const u32 lease_bitmap[3] = { FATTR4_WORD0_LEASE_TIME };
 
 	encode_compound_hdr(xdr, req, &hdr);
 	encode_sequence(xdr, &args->la_seq_args, &hdr);
@@ -3068,14 +3104,17 @@ static int decode_attr_bitmap(struct xdr_stream *xdr, uint32_t *bitmap)
 		goto out_overflow;
 	bmlen = be32_to_cpup(p);
 
-	bitmap[0] = bitmap[1] = 0;
+	bitmap[0] = bitmap[1] = bitmap[2] = 0;
 	p = xdr_inline_decode(xdr, (bmlen << 2));
 	if (unlikely(!p))
 		goto out_overflow;
 	if (bmlen > 0) {
 		bitmap[0] = be32_to_cpup(p++);
-		if (bmlen > 1)
-			bitmap[1] = be32_to_cpup(p);
+		if (bmlen > 1) {
+			bitmap[1] = be32_to_cpup(p++);
+			if (bmlen > 2)
+				bitmap[2] = be32_to_cpup(p);
+		}
 	}
 	return 0;
 out_overflow:
@@ -3107,8 +3146,9 @@ static int decode_attr_supported(struct xdr_stream *xdr, uint32_t *bitmap, uint3
 			return ret;
 		bitmap[0] &= ~FATTR4_WORD0_SUPPORTED_ATTRS;
 	} else
-		bitmask[0] = bitmask[1] = 0;
-	dprintk("%s: bitmask=%08x:%08x\n", __func__, bitmask[0], bitmask[1]);
+		bitmask[0] = bitmask[1] = bitmask[2] = 0;
+	dprintk("%s: bitmask=%08x:%08x:%08x\n", __func__,
+		bitmask[0], bitmask[1], bitmask[2]);
 	return 0;
 }
 
@@ -4162,7 +4202,7 @@ out_overflow:
 static int decode_server_caps(struct xdr_stream *xdr, struct nfs4_server_caps_res *res)
 {
 	__be32 *savep;
-	uint32_t attrlen, bitmap[2] = {0};
+	uint32_t attrlen, bitmap[3] = {0};
 	int status;
 
 	if ((status = decode_op_hdr(xdr, OP_GETATTR)) != 0)
@@ -4188,7 +4228,7 @@ xdr_error:
 static int decode_statfs(struct xdr_stream *xdr, struct nfs_fsstat *fsstat)
 {
 	__be32 *savep;
-	uint32_t attrlen, bitmap[2] = {0};
+	uint32_t attrlen, bitmap[3] = {0};
 	int status;
 
 	if ((status = decode_op_hdr(xdr, OP_GETATTR)) != 0)
@@ -4220,7 +4260,7 @@ xdr_error:
 static int decode_pathconf(struct xdr_stream *xdr, struct nfs_pathconf *pathconf)
 {
 	__be32 *savep;
-	uint32_t attrlen, bitmap[2] = {0};
+	uint32_t attrlen, bitmap[3] = {0};
 	int status;
 
 	if ((status = decode_op_hdr(xdr, OP_GETATTR)) != 0)
@@ -4360,7 +4400,7 @@ static int decode_getfattr_generic(struct xdr_stream *xdr, struct nfs_fattr *fat
 {
 	__be32 *savep;
 	uint32_t attrlen,
-		 bitmap[2] = {0};
+		 bitmap[3] = {0};
 	int status;
 
 	status = decode_op_hdr(xdr, OP_GETATTR);
@@ -4446,10 +4486,32 @@ static int decode_attr_pnfstype(struct xdr_stream *xdr, uint32_t *bitmap,
 	return status;
 }
 
+/*
+ * The prefered block size for layout directed io
+ */
+static int decode_attr_layout_blksize(struct xdr_stream *xdr, uint32_t *bitmap,
+				      uint32_t *res)
+{
+	__be32 *p;
+
+	dprintk("%s: bitmap is %x\n", __func__, bitmap[2]);
+	*res = 0;
+	if (bitmap[2] & FATTR4_WORD2_LAYOUT_BLKSIZE) {
+		p = xdr_inline_decode(xdr, 4);
+		if (unlikely(!p)) {
+			print_overflow_msg(__func__, xdr);
+			return -EIO;
+		}
+		*res = be32_to_cpup(p);
+		bitmap[2] &= ~FATTR4_WORD2_LAYOUT_BLKSIZE;
+	}
+	return 0;
+}
+
 static int decode_fsinfo(struct xdr_stream *xdr, struct nfs_fsinfo *fsinfo)
 {
 	__be32 *savep;
-	uint32_t attrlen, bitmap[2];
+	uint32_t attrlen, bitmap[3];
 	int status;
 
 	if ((status = decode_op_hdr(xdr, OP_GETATTR)) != 0)
@@ -4477,6 +4539,9 @@ static int decode_fsinfo(struct xdr_stream *xdr, struct nfs_fsinfo *fsinfo)
 	status = decode_attr_pnfstype(xdr, bitmap, &fsinfo->layouttype);
 	if (status != 0)
 		goto xdr_error;
+	status = decode_attr_layout_blksize(xdr, bitmap, &fsinfo->blksize);
+	if (status)
+		goto xdr_error;
 
 	status = verify_attr_len(xdr, savep, attrlen);
 xdr_error:
@@ -4896,7 +4961,7 @@ static int decode_getacl(struct xdr_stream *xdr, struct rpc_rqst *req,
 {
 	__be32 *savep;
 	uint32_t attrlen,
-		 bitmap[2] = {0};
+		 bitmap[3] = {0};
 	struct kvec *iov = req->rq_rcv_buf.head;
 	int status;
 
@@ -6849,7 +6914,7 @@ out:
 int nfs4_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
 		       int plus)
 {
-	uint32_t bitmap[2] = {0};
+	uint32_t bitmap[3] = {0};
 	uint32_t len;
 	__be32 *p = xdr_inline_decode(xdr, 4);
 	if (unlikely(!p))
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 4faeac8..b2ea8b8 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -132,7 +132,7 @@ struct nfs_server {
 #endif
 
 #ifdef CONFIG_NFS_V4
-	u32			attr_bitmask[2];/* V4 bitmask representing the set
+	u32			attr_bitmask[3];/* V4 bitmask representing the set
 						   of attributes supported on this
 						   filesystem */
 	u32			cache_consistency_bitmask[2];
@@ -145,6 +145,7 @@ struct nfs_server {
 						   filesystem */
 	struct pnfs_layoutdriver_type  *pnfs_curr_ld; /* Active layout driver */
 	struct rpc_wait_queue	roc_rpcwaitq;
+	u32			pnfs_blksize;	/* layout_blksize attr */
 
 	/* the following fields are protected by nfs_client->cl_lock */
 	struct rb_root		state_owners;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 21f333e..94f27e5 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -122,6 +122,7 @@ struct nfs_fsinfo {
 	struct timespec		time_delta; /* server time granularity */
 	__u32			lease_time; /* in seconds */
 	__u32			layouttype; /* supported pnfs layout driver */
+	__u32			blksize; /* preferred pnfs io block size */
 };
 
 struct nfs_fsstat {
@@ -954,7 +955,7 @@ struct nfs4_server_caps_arg {
 };
 
 struct nfs4_server_caps_res {
-	u32				attr_bitmask[2];
+	u32				attr_bitmask[3];
 	u32				acl_bitmask;
 	u32				has_links;
 	u32				has_symlinks;
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 06/25] pnfs: cleanup_layoutcommit
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (4 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 05/25] pnfs: ask for layout_blksize and save it in nfs_server Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 19:53   ` Trond Myklebust
  2011-07-27 18:40 ` [PATCH v3 07/25] pnfsblock: add blocklayout Kconfig option, Makefile, and stubs Jim Rees
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Andy Adamson <andros@netapp.com>

This gives layout driver a chance to cleanup structures they put in at
encode_layoutcommit.

Signed-off-by: Andy Adamson <andros@netapp.com>
[fixup layout header pointer for layoutcommit]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
---
 fs/nfs/nfs4proc.c       |    1 +
 fs/nfs/nfs4xdr.c        |    1 +
 fs/nfs/pnfs.c           |   10 ++++++++++
 fs/nfs/pnfs.h           |    5 +++++
 include/linux/nfs_xdr.h |    1 +
 5 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index e86de79..6cb84b4 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5963,6 +5963,7 @@ static void nfs4_layoutcommit_release(void *calldata)
 	struct nfs4_layoutcommit_data *data = calldata;
 	struct pnfs_layout_segment *lseg, *tmp;
 
+	pnfs_cleanup_layoutcommit(data->args.inode, data);
 	/* Matched by references in pnfs_set_layoutcommit */
 	list_for_each_entry_safe(lseg, tmp, &data->lseg_list, pls_lc_list) {
 		list_del_init(&lseg->pls_lc_list);
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index f32dde9..ea2da30 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -5596,6 +5596,7 @@ static int decode_layoutcommit(struct xdr_stream *xdr,
 	int status;
 
 	status = decode_op_hdr(xdr, OP_LAYOUTCOMMIT);
+	res->status = status;
 	if (status)
 		return status;
 
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index fbebd2a..3b20753 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1411,6 +1411,16 @@ pnfs_set_layoutcommit(struct nfs_write_data *wdata)
 }
 EXPORT_SYMBOL_GPL(pnfs_set_layoutcommit);
 
+void pnfs_cleanup_layoutcommit(struct inode *inode,
+			       struct nfs4_layoutcommit_data *data)
+{
+	struct nfs_server *nfss = NFS_SERVER(inode);
+
+	if (nfss->pnfs_curr_ld->cleanup_layoutcommit)
+		nfss->pnfs_curr_ld->cleanup_layoutcommit(NFS_I(inode)->layout,
+							 data);
+}
+
 /*
  * For the LAYOUT4_NFSV4_1_FILES layout type, NFS_DATA_SYNC WRITEs and
  * NFS_UNSTABLE WRITEs with a COMMIT to data servers must store enough
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index bddd8b9..f271425 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -113,6 +113,9 @@ struct pnfs_layoutdriver_type {
 				     struct xdr_stream *xdr,
 				     const struct nfs4_layoutreturn_args *args);
 
+	void (*cleanup_layoutcommit) (struct pnfs_layout_hdr *layoutid,
+				      struct nfs4_layoutcommit_data *data);
+
 	void (*encode_layoutcommit) (struct pnfs_layout_hdr *layoutid,
 				     struct xdr_stream *xdr,
 				     const struct nfs4_layoutcommit_args *args);
@@ -196,6 +199,8 @@ void pnfs_roc_release(struct inode *ino);
 void pnfs_roc_set_barrier(struct inode *ino, u32 barrier);
 bool pnfs_roc_drain(struct inode *ino, u32 *barrier);
 void pnfs_set_layoutcommit(struct nfs_write_data *wdata);
+void pnfs_cleanup_layoutcommit(struct inode *inode,
+			       struct nfs4_layoutcommit_data *data);
 int pnfs_layoutcommit_inode(struct inode *inode, bool sync);
 int _pnfs_return_layout(struct inode *);
 int pnfs_ld_write_done(struct nfs_write_data *);
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 94f27e5..569ea5b 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -269,6 +269,7 @@ struct nfs4_layoutcommit_res {
 	struct nfs_fattr *fattr;
 	const struct nfs_server *server;
 	struct nfs4_sequence_res seq_res;
+	int status;
 };
 
 struct nfs4_layoutcommit_data {
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 07/25] pnfsblock: add blocklayout Kconfig option, Makefile, and stubs
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (5 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 06/25] pnfs: cleanup_layoutcommit Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 18:40 ` [PATCH v3 08/25] pnfsblock: basic extent code Jim Rees
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Fred Isaman <iisaman@citi.umich.edu>

Define a configuration variable to enable/disable compilation of the
block driver code.

Add the minimal structure for a pnfs block layout driver, and empty
list-heads that will hold the extent data

[pnfsblock: make NFS_V4_1 select PNFS_BLOCK]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfs-block: fix CONFIG_PNFS_BLOCK dependencies]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[pnfsblock: SQUASHME: port block layout code]
Signed-off-by: Peng Tao <peng_tao@emc.com>
[pnfsblock: SQUASHME: adjust to API change]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfs: move pnfs_layout_type inline in nfs_inode]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[blocklayout: encode_layoutcommit implementation]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[pnfsblock: layout alloc and free]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfs: move pnfs_layout_type inline in nfs_inode]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[pnfsblock: define module alias]
Signed-off-by: Peng Tao <peng_tao@emc.com>
---
 fs/nfs/Kconfig                   |    8 ++-
 fs/nfs/Makefile                  |    1 +
 fs/nfs/blocklayout/Makefile      |    5 +
 fs/nfs/blocklayout/blocklayout.c |  175 ++++++++++++++++++++++++++++++++++++++
 fs/nfs/blocklayout/blocklayout.h |   91 ++++++++++++++++++++
 5 files changed, 279 insertions(+), 1 deletions(-)
 create mode 100644 fs/nfs/blocklayout/Makefile
 create mode 100644 fs/nfs/blocklayout/blocklayout.c
 create mode 100644 fs/nfs/blocklayout/blocklayout.h

diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index 2cde5d9..be02077 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -79,15 +79,21 @@ config NFS_V4_1
 	depends on NFS_FS && NFS_V4 && EXPERIMENTAL
 	select SUNRPC_BACKCHANNEL
 	select PNFS_FILE_LAYOUT
+	select PNFS_BLOCK
+	select MD
+	select BLK_DEV_DM
 	help
 	  This option enables support for minor version 1 of the NFSv4 protocol
-	  (RFC 5661) in the kernel's NFS client.
+	  (RFC 5661 and RFC 5663) in the kernel's NFS client.
 
 	  If unsure, say N.
 
 config PNFS_FILE_LAYOUT
 	tristate
 
+config PNFS_BLOCK
+	tristate
+
 config PNFS_OBJLAYOUT
 	tristate "Provide support for the pNFS Objects Layout Driver for NFSv4.1 pNFS (EXPERIMENTAL)"
 	depends on NFS_FS && NFS_V4_1 && SCSI_OSD_ULD
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 6a34f7d..b58613d 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -23,3 +23,4 @@ obj-$(CONFIG_PNFS_FILE_LAYOUT) += nfs_layout_nfsv41_files.o
 nfs_layout_nfsv41_files-y := nfs4filelayout.o nfs4filelayoutdev.o
 
 obj-$(CONFIG_PNFS_OBJLAYOUT) += objlayout/
+obj-$(CONFIG_PNFS_BLOCK) += blocklayout/
diff --git a/fs/nfs/blocklayout/Makefile b/fs/nfs/blocklayout/Makefile
new file mode 100644
index 0000000..6bf49cd
--- /dev/null
+++ b/fs/nfs/blocklayout/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the pNFS block layout driver kernel module
+#
+obj-$(CONFIG_PNFS_BLOCK) += blocklayoutdriver.o
+blocklayoutdriver-objs := blocklayout.o
diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
new file mode 100644
index 0000000..55a2a95
--- /dev/null
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -0,0 +1,175 @@
+/*
+ *  linux/fs/nfs/blocklayout/blocklayout.c
+ *
+ *  Module for the NFSv4.1 pNFS block layout driver.
+ *
+ *  Copyright (c) 2006 The Regents of the University of Michigan.
+ *  All rights reserved.
+ *
+ *  Andy Adamson <andros@citi.umich.edu>
+ *  Fred Isaman <iisaman@umich.edu>
+ *
+ * permission is granted to use, copy, create derivative works and
+ * redistribute this software and such derivative works for any purpose,
+ * so long as the name of the university of michigan is not used in
+ * any advertising or publicity pertaining to the use or distribution
+ * of this software without specific, written prior authorization.  if
+ * the above copyright notice or any other identification of the
+ * university of michigan is included in any copy of any portion of
+ * this software, then the disclaimer below must also be included.
+ *
+ * this software is provided as is, without representation from the
+ * university of michigan as to its fitness for any purpose, and without
+ * warranty by the university of michigan of any kind, either express
+ * or implied, including without limitation the implied warranties of
+ * merchantability and fitness for a particular purpose.  the regents
+ * of the university of michigan shall not be liable for any damages,
+ * including special, indirect, incidental, or consequential damages,
+ * with respect to any claim arising out or in connection with the use
+ * of the software, even if it has been or is hereafter advised of the
+ * possibility of such damages.
+ */
+#include <linux/module.h>
+#include <linux/init.h>
+
+#include "blocklayout.h"
+
+#define NFSDBG_FACILITY	NFSDBG_PNFS_LD
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Andy Adamson <andros@citi.umich.edu>");
+MODULE_DESCRIPTION("The NFSv4.1 pNFS Block layout driver");
+
+static enum pnfs_try_status
+bl_read_pagelist(struct nfs_read_data *rdata)
+{
+	return PNFS_NOT_ATTEMPTED;
+}
+
+static enum pnfs_try_status
+bl_write_pagelist(struct nfs_write_data *wdata,
+		  int sync)
+{
+	return PNFS_NOT_ATTEMPTED;
+}
+
+/* STUB */
+static void
+release_extents(struct pnfs_block_layout *bl,
+		struct pnfs_layout_range *range)
+{
+	return;
+}
+
+/* STUB */
+static void
+release_inval_marks(struct pnfs_inval_markings *marks)
+{
+	return;
+}
+
+static void bl_free_layout_hdr(struct pnfs_layout_hdr *lo)
+{
+	struct pnfs_block_layout *bl = BLK_LO2EXT(lo);
+
+	dprintk("%s enter\n", __func__);
+	release_extents(bl, NULL);
+	release_inval_marks(&bl->bl_inval);
+	kfree(bl);
+}
+
+static struct pnfs_layout_hdr *bl_alloc_layout_hdr(struct inode *inode,
+						   gfp_t gfp_flags)
+{
+	struct pnfs_block_layout *bl;
+
+	dprintk("%s enter\n", __func__);
+	bl = kzalloc(sizeof(*bl), gfp_flags);
+	if (!bl)
+		return NULL;
+	spin_lock_init(&bl->bl_ext_lock);
+	INIT_LIST_HEAD(&bl->bl_extents[0]);
+	INIT_LIST_HEAD(&bl->bl_extents[1]);
+	INIT_LIST_HEAD(&bl->bl_commit);
+	INIT_LIST_HEAD(&bl->bl_committing);
+	bl->bl_count = 0;
+	bl->bl_blocksize = NFS_SERVER(inode)->pnfs_blksize >> SECTOR_SHIFT;
+	INIT_INVAL_MARKS(&bl->bl_inval, bl->bl_blocksize);
+	return &bl->bl_layout;
+}
+
+static void
+bl_free_lseg(struct pnfs_layout_segment *lseg)
+{
+}
+
+static struct pnfs_layout_segment *
+bl_alloc_lseg(struct pnfs_layout_hdr *lo,
+	      struct nfs4_layoutget_res *lgr, gfp_t gfp_flags)
+{
+	return NULL;
+}
+
+static void
+bl_encode_layoutcommit(struct pnfs_layout_hdr *lo, struct xdr_stream *xdr,
+		       const struct nfs4_layoutcommit_args *arg)
+{
+}
+
+static void
+bl_cleanup_layoutcommit(struct pnfs_layout_hdr *lo,
+			struct nfs4_layoutcommit_data *lcdata)
+{
+}
+
+static int
+bl_set_layoutdriver(struct nfs_server *server, const struct nfs_fh *fh)
+{
+	dprintk("%s enter\n", __func__);
+	return 0;
+}
+
+static int
+bl_clear_layoutdriver(struct nfs_server *server)
+{
+	dprintk("%s enter\n", __func__);
+	return 0;
+}
+
+static struct pnfs_layoutdriver_type blocklayout_type = {
+	.id				= LAYOUT_BLOCK_VOLUME,
+	.name				= "LAYOUT_BLOCK_VOLUME",
+	.read_pagelist			= bl_read_pagelist,
+	.write_pagelist			= bl_write_pagelist,
+	.alloc_layout_hdr		= bl_alloc_layout_hdr,
+	.free_layout_hdr		= bl_free_layout_hdr,
+	.alloc_lseg			= bl_alloc_lseg,
+	.free_lseg			= bl_free_lseg,
+	.encode_layoutcommit		= bl_encode_layoutcommit,
+	.cleanup_layoutcommit		= bl_cleanup_layoutcommit,
+	.set_layoutdriver		= bl_set_layoutdriver,
+	.clear_layoutdriver		= bl_clear_layoutdriver,
+};
+
+static int __init nfs4blocklayout_init(void)
+{
+	int ret;
+
+	dprintk("%s: NFSv4 Block Layout Driver Registering...\n", __func__);
+
+	ret = pnfs_register_layoutdriver(&blocklayout_type);
+	return ret;
+}
+
+static void __exit nfs4blocklayout_exit(void)
+{
+	dprintk("%s: NFSv4 Block Layout Driver Unregistering...\n",
+	       __func__);
+
+	pnfs_unregister_layoutdriver(&blocklayout_type);
+}
+
+MODULE_ALIAS("nfs-layouttype4-3");
+
+module_init(nfs4blocklayout_init);
+module_exit(nfs4blocklayout_exit);
diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
new file mode 100644
index 0000000..0e6ae06
--- /dev/null
+++ b/fs/nfs/blocklayout/blocklayout.h
@@ -0,0 +1,91 @@
+/*
+ *  linux/fs/nfs/blocklayout/blocklayout.h
+ *
+ *  Module for the NFSv4.1 pNFS block layout driver.
+ *
+ *  Copyright (c) 2006 The Regents of the University of Michigan.
+ *  All rights reserved.
+ *
+ *  Andy Adamson <andros@citi.umich.edu>
+ *  Fred Isaman <iisaman@umich.edu>
+ *
+ * permission is granted to use, copy, create derivative works and
+ * redistribute this software and such derivative works for any purpose,
+ * so long as the name of the university of michigan is not used in
+ * any advertising or publicity pertaining to the use or distribution
+ * of this software without specific, written prior authorization.  if
+ * the above copyright notice or any other identification of the
+ * university of michigan is included in any copy of any portion of
+ * this software, then the disclaimer below must also be included.
+ *
+ * this software is provided as is, without representation from the
+ * university of michigan as to its fitness for any purpose, and without
+ * warranty by the university of michigan of any kind, either express
+ * or implied, including without limitation the implied warranties of
+ * merchantability and fitness for a particular purpose.  the regents
+ * of the university of michigan shall not be liable for any damages,
+ * including special, indirect, incidental, or consequential damages,
+ * with respect to any claim arising out or in connection with the use
+ * of the software, even if it has been or is hereafter advised of the
+ * possibility of such damages.
+ */
+#ifndef FS_NFS_NFS4BLOCKLAYOUT_H
+#define FS_NFS_NFS4BLOCKLAYOUT_H
+
+#include <linux/device-mapper.h>
+#include <linux/nfs_fs.h>
+#include "../pnfs.h"
+
+enum exstate4 {
+	PNFS_BLOCK_READWRITE_DATA	= 0,
+	PNFS_BLOCK_READ_DATA		= 1,
+	PNFS_BLOCK_INVALID_DATA		= 2, /* mapped, but data is invalid */
+	PNFS_BLOCK_NONE_DATA		= 3  /* unmapped, it's a hole */
+};
+
+struct pnfs_inval_markings {
+	/* STUB */
+};
+
+/* sector_t fields are all in 512-byte sectors */
+struct pnfs_block_extent {
+	struct kref	be_refcnt;
+	struct list_head be_node;	/* link into lseg list */
+	struct nfs4_deviceid be_devid;  /* FIXME: could use device cache instead */
+	struct block_device *be_mdev;
+	sector_t	be_f_offset;	/* the starting offset in the file */
+	sector_t	be_length;	/* the size of the extent */
+	sector_t	be_v_offset;	/* the starting offset in the volume */
+	enum exstate4	be_state;	/* the state of this extent */
+	struct pnfs_inval_markings *be_inval; /* tracks INVAL->RW transition */
+};
+
+static inline void
+INIT_INVAL_MARKS(struct pnfs_inval_markings *marks, sector_t blocksize)
+{
+	/* STUB */
+}
+
+enum extentclass4 {
+	RW_EXTENT       = 0, /* READWRTE and INVAL */
+	RO_EXTENT       = 1, /* READ and NONE */
+	EXTENT_LISTS    = 2,
+};
+
+struct pnfs_block_layout {
+	struct pnfs_layout_hdr bl_layout;
+	struct pnfs_inval_markings bl_inval; /* tracks INVAL->RW transition */
+	spinlock_t		bl_ext_lock;   /* Protects list manipulation */
+	struct list_head	bl_extents[EXTENT_LISTS]; /* R and RW extents */
+	struct list_head	bl_commit;	/* Needs layout commit */
+	struct list_head	bl_committing;	/* Layout committing */
+	unsigned int		bl_count;	/* entries in bl_commit */
+	sector_t		bl_blocksize;  /* Server blocksize in sectors */
+};
+
+static inline struct pnfs_block_layout *BLK_LO2EXT(struct pnfs_layout_hdr *lo)
+{
+	return container_of(lo, struct pnfs_block_layout, bl_layout);
+}
+
+#endif /* FS_NFS_NFS4BLOCKLAYOUT_H */
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 08/25] pnfsblock: basic extent code
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (6 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 07/25] pnfsblock: add blocklayout Kconfig option, Makefile, and stubs Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 18:40 ` [PATCH v3 09/25] pnfsblock: add device operations Jim Rees
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Fred Isaman <iisaman@citi.umich.edu>

Adds structures and basic create/delete code for extents.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Zhang Jingwang <Jingwang.Zhang@emc.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
---
 fs/nfs/blocklayout/Makefile      |    2 +-
 fs/nfs/blocklayout/blocklayout.c |   20 ++++++--
 fs/nfs/blocklayout/blocklayout.h |    1 +
 fs/nfs/blocklayout/extents.c     |   97 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 115 insertions(+), 5 deletions(-)
 create mode 100644 fs/nfs/blocklayout/extents.c

diff --git a/fs/nfs/blocklayout/Makefile b/fs/nfs/blocklayout/Makefile
index 6bf49cd..5cfadf6 100644
--- a/fs/nfs/blocklayout/Makefile
+++ b/fs/nfs/blocklayout/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the pNFS block layout driver kernel module
 #
 obj-$(CONFIG_PNFS_BLOCK) += blocklayoutdriver.o
-blocklayoutdriver-objs := blocklayout.o
+blocklayoutdriver-objs := blocklayout.o extents.o
diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 55a2a95..291cc01 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -53,12 +53,24 @@ bl_write_pagelist(struct nfs_write_data *wdata,
 	return PNFS_NOT_ATTEMPTED;
 }
 
-/* STUB */
+/* FIXME - range ignored */
 static void
-release_extents(struct pnfs_block_layout *bl,
-		struct pnfs_layout_range *range)
+release_extents(struct pnfs_block_layout *bl, struct pnfs_layout_range *range)
 {
-	return;
+	int i;
+	struct pnfs_block_extent *be;
+
+	spin_lock(&bl->bl_ext_lock);
+	for (i = 0; i < EXTENT_LISTS; i++) {
+		while (!list_empty(&bl->bl_extents[i])) {
+			be = list_first_entry(&bl->bl_extents[i],
+					      struct pnfs_block_extent,
+					      be_node);
+			list_del(&be->be_node);
+			bl_put_extent(be);
+		}
+	}
+	spin_unlock(&bl->bl_ext_lock);
 }
 
 /* STUB */
diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
index 0e6ae06..3fec302 100644
--- a/fs/nfs/blocklayout/blocklayout.h
+++ b/fs/nfs/blocklayout/blocklayout.h
@@ -88,4 +88,5 @@ static inline struct pnfs_block_layout *BLK_LO2EXT(struct pnfs_layout_hdr *lo)
 	return container_of(lo, struct pnfs_block_layout, bl_layout);
 }
 
+void bl_put_extent(struct pnfs_block_extent *be);
 #endif /* FS_NFS_NFS4BLOCKLAYOUT_H */
diff --git a/fs/nfs/blocklayout/extents.c b/fs/nfs/blocklayout/extents.c
new file mode 100644
index 0000000..44c3364
--- /dev/null
+++ b/fs/nfs/blocklayout/extents.c
@@ -0,0 +1,97 @@
+/*
+ *  linux/fs/nfs/blocklayout/blocklayout.h
+ *
+ *  Module for the NFSv4.1 pNFS block layout driver.
+ *
+ *  Copyright (c) 2006 The Regents of the University of Michigan.
+ *  All rights reserved.
+ *
+ *  Andy Adamson <andros@citi.umich.edu>
+ *  Fred Isaman <iisaman@umich.edu>
+ *
+ * permission is granted to use, copy, create derivative works and
+ * redistribute this software and such derivative works for any purpose,
+ * so long as the name of the university of michigan is not used in
+ * any advertising or publicity pertaining to the use or distribution
+ * of this software without specific, written prior authorization.  if
+ * the above copyright notice or any other identification of the
+ * university of michigan is included in any copy of any portion of
+ * this software, then the disclaimer below must also be included.
+ *
+ * this software is provided as is, without representation from the
+ * university of michigan as to its fitness for any purpose, and without
+ * warranty by the university of michigan of any kind, either express
+ * or implied, including without limitation the implied warranties of
+ * merchantability and fitness for a particular purpose.  the regents
+ * of the university of michigan shall not be liable for any damages,
+ * including special, indirect, incidental, or consequential damages,
+ * with respect to any claim arising out or in connection with the use
+ * of the software, even if it has been or is hereafter advised of the
+ * possibility of such damages.
+ */
+
+#include "blocklayout.h"
+#define NFSDBG_FACILITY         NFSDBG_PNFS_LD
+
+static void print_bl_extent(struct pnfs_block_extent *be)
+{
+	dprintk("PRINT EXTENT extent %p\n", be);
+	if (be) {
+		dprintk("        be_f_offset %llu\n", (u64)be->be_f_offset);
+		dprintk("        be_length   %llu\n", (u64)be->be_length);
+		dprintk("        be_v_offset %llu\n", (u64)be->be_v_offset);
+		dprintk("        be_state    %d\n", be->be_state);
+	}
+}
+
+static void
+destroy_extent(struct kref *kref)
+{
+	struct pnfs_block_extent *be;
+
+	be = container_of(kref, struct pnfs_block_extent, be_refcnt);
+	dprintk("%s be=%p\n", __func__, be);
+	kfree(be);
+}
+
+void
+bl_put_extent(struct pnfs_block_extent *be)
+{
+	if (be) {
+		dprintk("%s enter %p (%i)\n", __func__, be,
+			atomic_read(&be->be_refcnt.refcount));
+		kref_put(&be->be_refcnt, destroy_extent);
+	}
+}
+
+struct pnfs_block_extent *alloc_extent(void)
+{
+	struct pnfs_block_extent *be;
+
+	be = kmalloc(sizeof(struct pnfs_block_extent), GFP_NOFS);
+	if (!be)
+		return NULL;
+	INIT_LIST_HEAD(&be->be_node);
+	kref_init(&be->be_refcnt);
+	be->be_inval = NULL;
+	return be;
+}
+
+struct pnfs_block_extent *
+get_extent(struct pnfs_block_extent *be)
+{
+	if (be)
+		kref_get(&be->be_refcnt);
+	return be;
+}
+
+void print_elist(struct list_head *list)
+{
+	struct pnfs_block_extent *be;
+	dprintk("****************\n");
+	dprintk("Extent list looks like:\n");
+	list_for_each_entry(be, list, be_node) {
+		print_bl_extent(be);
+	}
+	dprintk("****************\n");
+}
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 09/25] pnfsblock: add device operations
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (7 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 08/25] pnfsblock: basic extent code Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 18:40 ` [PATCH v3 10/25] pnfsblock: remove " Jim Rees
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[upcall bugfixes]
Signed-off-by: Peng Tao <peng_tao@emc.com>
---
 fs/nfs/blocklayout/Makefile         |    2 +-
 fs/nfs/blocklayout/blocklayout.c    |   42 ++++++++
 fs/nfs/blocklayout/blocklayout.h    |   40 +++++++
 fs/nfs/blocklayout/blocklayoutdev.c |  191 +++++++++++++++++++++++++++++++++++
 fs/nfs/client.c                     |    2 +-
 include/linux/nfs.h                 |    2 +
 6 files changed, 277 insertions(+), 2 deletions(-)
 create mode 100644 fs/nfs/blocklayout/blocklayoutdev.c

diff --git a/fs/nfs/blocklayout/Makefile b/fs/nfs/blocklayout/Makefile
index 5cfadf6..5bf3409 100644
--- a/fs/nfs/blocklayout/Makefile
+++ b/fs/nfs/blocklayout/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the pNFS block layout driver kernel module
 #
 obj-$(CONFIG_PNFS_BLOCK) += blocklayoutdriver.o
-blocklayoutdriver-objs := blocklayout.o extents.o
+blocklayoutdriver-objs := blocklayout.o extents.o blocklayoutdev.o
diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 291cc01..3afe363 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -31,6 +31,8 @@
  */
 #include <linux/module.h>
 #include <linux/init.h>
+#include <linux/mount.h>
+#include <linux/namei.h>
 
 #include "blocklayout.h"
 
@@ -40,6 +42,9 @@ MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Andy Adamson <andros@citi.umich.edu>");
 MODULE_DESCRIPTION("The NFSv4.1 pNFS Block layout driver");
 
+struct dentry *bl_device_pipe;
+wait_queue_head_t bl_wq;
+
 static enum pnfs_try_status
 bl_read_pagelist(struct nfs_read_data *rdata)
 {
@@ -163,13 +168,49 @@ static struct pnfs_layoutdriver_type blocklayout_type = {
 	.clear_layoutdriver		= bl_clear_layoutdriver,
 };
 
+static const struct rpc_pipe_ops bl_upcall_ops = {
+	.upcall		= bl_pipe_upcall,
+	.downcall	= bl_pipe_downcall,
+	.destroy_msg	= bl_pipe_destroy_msg,
+};
+
 static int __init nfs4blocklayout_init(void)
 {
+	struct nameidata nd;
+	struct path path;
 	int ret;
 
 	dprintk("%s: NFSv4 Block Layout Driver Registering...\n", __func__);
 
 	ret = pnfs_register_layoutdriver(&blocklayout_type);
+	if (ret)
+		goto out;
+
+	init_waitqueue_head(&bl_wq);
+
+	path.mnt = rpc_get_mount();
+	if (IS_ERR(path.mnt)) {
+		ret = PTR_ERR(path.mnt);
+		goto out_remove;
+	}
+
+	ret = vfs_path_lookup(path.mnt->mnt_root,
+			      path.mnt,
+			      NFS_PIPE_DIRNAME, 0, &nd);
+	if (ret)
+		goto out_remove;
+
+	bl_device_pipe = rpc_mkpipe(nd.path.dentry, "blocklayout", NULL,
+				    &bl_upcall_ops, 0);
+	if (IS_ERR(bl_device_pipe)) {
+		ret = PTR_ERR(bl_device_pipe);
+		goto out_remove;
+	}
+out:
+	return ret;
+
+out_remove:
+	pnfs_unregister_layoutdriver(&blocklayout_type);
 	return ret;
 }
 
@@ -179,6 +220,7 @@ static void __exit nfs4blocklayout_exit(void)
 	       __func__);
 
 	pnfs_unregister_layoutdriver(&blocklayout_type);
+	rpc_unlink(bl_device_pipe);
 }
 
 MODULE_ALIAS("nfs-layouttype4-3");
diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
index 3fec302..3dcc971 100644
--- a/fs/nfs/blocklayout/blocklayout.h
+++ b/fs/nfs/blocklayout/blocklayout.h
@@ -34,8 +34,16 @@
 
 #include <linux/device-mapper.h>
 #include <linux/nfs_fs.h>
+#include <linux/sunrpc/rpc_pipe_fs.h>
+
 #include "../pnfs.h"
 
+struct pnfs_block_dev {
+	struct list_head		bm_node;
+	struct nfs4_deviceid		bm_mdevid;    /* associated devid */
+	struct block_device		*bm_mdev;     /* meta device itself */
+};
+
 enum exstate4 {
 	PNFS_BLOCK_READWRITE_DATA	= 0,
 	PNFS_BLOCK_READ_DATA		= 1,
@@ -88,5 +96,37 @@ static inline struct pnfs_block_layout *BLK_LO2EXT(struct pnfs_layout_hdr *lo)
 	return container_of(lo, struct pnfs_block_layout, bl_layout);
 }
 
+struct bl_dev_msg {
+	int status;
+	uint32_t major, minor;
+};
+
+struct bl_msg_hdr {
+	u8  type;
+	u16 totallen; /* length of entire message, including hdr itself */
+};
+
+extern struct dentry *bl_device_pipe;
+extern wait_queue_head_t bl_wq;
+
+#define BL_DEVICE_UMOUNT               0x0 /* Umount--delete devices */
+#define BL_DEVICE_MOUNT                0x1 /* Mount--create devices*/
+#define BL_DEVICE_REQUEST_INIT         0x0 /* Start request */
+#define BL_DEVICE_REQUEST_PROC         0x1 /* User level process succeeds */
+#define BL_DEVICE_REQUEST_ERR          0x2 /* User level process fails */
+
+/* blocklayoutdev.c */
+ssize_t bl_pipe_upcall(struct file *, struct rpc_pipe_msg *,
+		       char __user *, size_t);
+ssize_t bl_pipe_downcall(struct file *, const char __user *, size_t);
+void bl_pipe_destroy_msg(struct rpc_pipe_msg *);
+struct block_device *nfs4_blkdev_get(dev_t dev);
+int nfs4_blkdev_put(struct block_device *bdev);
+struct pnfs_block_dev *nfs4_blk_decode_device(struct nfs_server *server,
+						struct pnfs_device *dev,
+						struct list_head *sdlist);
+int nfs4_blk_process_layoutget(struct pnfs_layout_hdr *lo,
+				struct nfs4_layoutget_res *lgr, gfp_t gfp_flags);
+
 void bl_put_extent(struct pnfs_block_extent *be);
 #endif /* FS_NFS_NFS4BLOCKLAYOUT_H */
diff --git a/fs/nfs/blocklayout/blocklayoutdev.c b/fs/nfs/blocklayout/blocklayoutdev.c
new file mode 100644
index 0000000..7e1377f
--- /dev/null
+++ b/fs/nfs/blocklayout/blocklayoutdev.c
@@ -0,0 +1,191 @@
+/*
+ *  linux/fs/nfs/blocklayout/blocklayoutdev.c
+ *
+ *  Device operations for the pnfs nfs4 file layout driver.
+ *
+ *  Copyright (c) 2006 The Regents of the University of Michigan.
+ *  All rights reserved.
+ *
+ *  Andy Adamson <andros@citi.umich.edu>
+ *  Fred Isaman <iisaman@umich.edu>
+ *
+ * permission is granted to use, copy, create derivative works and
+ * redistribute this software and such derivative works for any purpose,
+ * so long as the name of the university of michigan is not used in
+ * any advertising or publicity pertaining to the use or distribution
+ * of this software without specific, written prior authorization.  if
+ * the above copyright notice or any other identification of the
+ * university of michigan is included in any copy of any portion of
+ * this software, then the disclaimer below must also be included.
+ *
+ * this software is provided as is, without representation from the
+ * university of michigan as to its fitness for any purpose, and without
+ * warranty by the university of michigan of any kind, either express
+ * or implied, including without limitation the implied warranties of
+ * merchantability and fitness for a particular purpose.  the regents
+ * of the university of michigan shall not be liable for any damages,
+ * including special, indirect, incidental, or consequential damages,
+ * with respect to any claim arising out or in connection with the use
+ * of the software, even if it has been or is hereafter advised of the
+ * possibility of such damages.
+ */
+#include <linux/module.h>
+#include <linux/buffer_head.h> /* __bread */
+
+#include <linux/genhd.h>
+#include <linux/blkdev.h>
+#include <linux/hash.h>
+
+#include "blocklayout.h"
+
+#define NFSDBG_FACILITY         NFSDBG_PNFS_LD
+
+/* Open a block_device by device number. */
+struct block_device *nfs4_blkdev_get(dev_t dev)
+{
+	struct block_device *bd;
+
+	dprintk("%s enter\n", __func__);
+	bd = blkdev_get_by_dev(dev, FMODE_READ, NULL);
+	if (IS_ERR(bd))
+		goto fail;
+	return bd;
+fail:
+	dprintk("%s failed to open device : %ld\n",
+			__func__, PTR_ERR(bd));
+	return NULL;
+}
+
+/*
+ * Release the block device
+ */
+int nfs4_blkdev_put(struct block_device *bdev)
+{
+	dprintk("%s for device %d:%d\n", __func__, MAJOR(bdev->bd_dev),
+			MINOR(bdev->bd_dev));
+	return blkdev_put(bdev, FMODE_READ);
+}
+
+/*
+ * Shouldn't there be a rpc_generic_upcall() to do this for us?
+ */
+ssize_t bl_pipe_upcall(struct file *filp, struct rpc_pipe_msg *msg,
+		       char __user *dst, size_t buflen)
+{
+	char *data = (char *)msg->data + msg->copied;
+	size_t mlen = min(msg->len - msg->copied, buflen);
+	unsigned long left;
+
+	left = copy_to_user(dst, data, mlen);
+	if (left == mlen) {
+		msg->errno = -EFAULT;
+		return -EFAULT;
+	}
+
+	mlen -= left;
+	msg->copied += mlen;
+	msg->errno = 0;
+	return mlen;
+}
+
+static struct bl_dev_msg bl_mount_reply;
+
+ssize_t bl_pipe_downcall(struct file *filp, const char __user *src,
+			 size_t mlen)
+{
+	if (mlen != sizeof (struct bl_dev_msg))
+		return -EINVAL;
+
+	if (copy_from_user(&bl_mount_reply, src, mlen) != 0)
+		return -EFAULT;
+
+	wake_up(&bl_wq);
+
+	return mlen;
+}
+
+void bl_pipe_destroy_msg(struct rpc_pipe_msg *msg)
+{
+	if (msg->errno >= 0)
+		return;
+	wake_up(&bl_wq);
+}
+
+/*
+ * Decodes pnfs_block_deviceaddr4 which is XDR encoded in dev->dev_addr_buf.
+ */
+struct pnfs_block_dev *
+nfs4_blk_decode_device(struct nfs_server *server,
+		       struct pnfs_device *dev,
+		       struct list_head *sdlist)
+{
+	struct pnfs_block_dev *rv = NULL;
+	struct block_device *bd = NULL;
+	struct rpc_pipe_msg msg;
+	struct bl_msg_hdr bl_msg = {
+		.type = BL_DEVICE_MOUNT,
+		.totallen = dev->mincount,
+	};
+	uint8_t *dataptr;
+	DECLARE_WAITQUEUE(wq, current);
+	struct bl_dev_msg *reply = &bl_mount_reply;
+
+	dprintk("%s CREATING PIPEFS MESSAGE\n", __func__);
+	dprintk("%s: deviceid: %s, mincount: %d\n", __func__, dev->dev_id.data,
+		dev->mincount);
+
+	memset(&msg, 0, sizeof(msg));
+	msg.data = kzalloc(sizeof(bl_msg) + dev->mincount, GFP_NOFS);
+	if (!msg.data) {
+		rv = ERR_PTR(-ENOMEM);
+		goto out;
+	}
+
+	memcpy(msg.data, &bl_msg, sizeof(bl_msg));
+	dataptr = (uint8_t *) msg.data;
+	memcpy(&dataptr[sizeof(bl_msg)], dev->area, dev->mincount);
+	msg.len = sizeof(bl_msg) + dev->mincount;
+
+	dprintk("%s CALLING USERSPACE DAEMON\n", __func__);
+	add_wait_queue(&bl_wq, &wq);
+	if (rpc_queue_upcall(bl_device_pipe->d_inode, &msg) < 0) {
+		remove_wait_queue(&bl_wq, &wq);
+		goto out;
+	}
+
+	set_current_state(TASK_UNINTERRUPTIBLE);
+	schedule();
+	__set_current_state(TASK_RUNNING);
+	remove_wait_queue(&bl_wq, &wq);
+
+	if (reply->status != BL_DEVICE_REQUEST_PROC) {
+		dprintk("%s failed to open device: %d\n",
+			__func__, reply->status);
+		rv = ERR_PTR(-EINVAL);
+		goto out;
+	}
+
+	bd = nfs4_blkdev_get(MKDEV(reply->major, reply->minor));
+	if (IS_ERR(bd)) {
+		dprintk("%s failed to open device : %ld\n",
+			__func__, PTR_ERR(bd));
+		goto out;
+	}
+
+	rv = kzalloc(sizeof(*rv), GFP_NOFS);
+	if (!rv) {
+		rv = ERR_PTR(-ENOMEM);
+		goto out;
+	}
+
+	rv->bm_mdev = bd;
+	memcpy(&rv->bm_mdevid, &dev->dev_id, sizeof(struct nfs4_deviceid));
+	dprintk("%s Created device %s with bd_block_size %u\n",
+		__func__,
+		bd->bd_disk->disk_name,
+		bd->bd_block_size);
+
+out:
+	kfree(msg.data);
+	return rv;
+}
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index de00a37..5833fbb 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -105,7 +105,7 @@ struct rpc_program nfs_program = {
 	.nrvers			= ARRAY_SIZE(nfs_version),
 	.version		= nfs_version,
 	.stats			= &nfs_rpcstat,
-	.pipe_dir_name		= "/nfs",
+	.pipe_dir_name		= NFS_PIPE_DIRNAME,
 };
 
 struct rpc_stat nfs_rpcstat = {
diff --git a/include/linux/nfs.h b/include/linux/nfs.h
index f387919..8c6ee44 100644
--- a/include/linux/nfs.h
+++ b/include/linux/nfs.h
@@ -29,6 +29,8 @@
 #define NFS_MNT_VERSION		1
 #define NFS_MNT3_VERSION	3
 
+#define NFS_PIPE_DIRNAME "/nfs"
+
 /*
  * NFS stats. The good thing with these values is that NFSv3 errors are
  * a superset of NFSv2 errors (with the exception of NFSERR_WFLUSH which
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 10/25] pnfsblock: remove device operations
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (8 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 09/25] pnfsblock: add device operations Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 18:40 ` [PATCH v3 11/25] pnfsblock: lseg alloc and free Jim Rees
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[upcall bugfixes]
Signed-off-by: Peng Tao <peng_tao@emc.com>
---
 fs/nfs/blocklayout/Makefile        |    2 +-
 fs/nfs/blocklayout/blocklayout.h   |    3 +
 fs/nfs/blocklayout/blocklayoutdm.c |  111 ++++++++++++++++++++++++++++++++++++
 3 files changed, 115 insertions(+), 1 deletions(-)
 create mode 100644 fs/nfs/blocklayout/blocklayoutdm.c

diff --git a/fs/nfs/blocklayout/Makefile b/fs/nfs/blocklayout/Makefile
index 5bf3409..d581550 100644
--- a/fs/nfs/blocklayout/Makefile
+++ b/fs/nfs/blocklayout/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the pNFS block layout driver kernel module
 #
 obj-$(CONFIG_PNFS_BLOCK) += blocklayoutdriver.o
-blocklayoutdriver-objs := blocklayout.o extents.o blocklayoutdev.o
+blocklayoutdriver-objs := blocklayout.o extents.o blocklayoutdev.o blocklayoutdm.o
diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
index 3dcc971..9b88918 100644
--- a/fs/nfs/blocklayout/blocklayout.h
+++ b/fs/nfs/blocklayout/blocklayout.h
@@ -128,5 +128,8 @@ struct pnfs_block_dev *nfs4_blk_decode_device(struct nfs_server *server,
 int nfs4_blk_process_layoutget(struct pnfs_layout_hdr *lo,
 				struct nfs4_layoutget_res *lgr, gfp_t gfp_flags);
 
+/* blocklayoutdm.c */
+void free_block_dev(struct pnfs_block_dev *bdev);
+
 void bl_put_extent(struct pnfs_block_extent *be);
 #endif /* FS_NFS_NFS4BLOCKLAYOUT_H */
diff --git a/fs/nfs/blocklayout/blocklayoutdm.c b/fs/nfs/blocklayout/blocklayoutdm.c
new file mode 100644
index 0000000..eab95f3
--- /dev/null
+++ b/fs/nfs/blocklayout/blocklayoutdm.c
@@ -0,0 +1,111 @@
+/*
+ *  linux/fs/nfs/blocklayout/blocklayoutdm.c
+ *
+ *  Module for the NFSv4.1 pNFS block layout driver.
+ *
+ *  Copyright (c) 2007 The Regents of the University of Michigan.
+ *  All rights reserved.
+ *
+ *  Fred Isaman <iisaman@umich.edu>
+ *  Andy Adamson <andros@citi.umich.edu>
+ *
+ * permission is granted to use, copy, create derivative works and
+ * redistribute this software and such derivative works for any purpose,
+ * so long as the name of the university of michigan is not used in
+ * any advertising or publicity pertaining to the use or distribution
+ * of this software without specific, written prior authorization.  if
+ * the above copyright notice or any other identification of the
+ * university of michigan is included in any copy of any portion of
+ * this software, then the disclaimer below must also be included.
+ *
+ * this software is provided as is, without representation from the
+ * university of michigan as to its fitness for any purpose, and without
+ * warranty by the university of michigan of any kind, either express
+ * or implied, including without limitation the implied warranties of
+ * merchantability and fitness for a particular purpose.  the regents
+ * of the university of michigan shall not be liable for any damages,
+ * including special, indirect, incidental, or consequential damages,
+ * with respect to any claim arising out or in connection with the use
+ * of the software, even if it has been or is hereafter advised of the
+ * possibility of such damages.
+ */
+
+#include <linux/genhd.h> /* gendisk - used in a dprintk*/
+#include <linux/sched.h>
+#include <linux/hash.h>
+
+#include "blocklayout.h"
+
+#define NFSDBG_FACILITY         NFSDBG_PNFS_LD
+
+static void dev_remove(dev_t dev)
+{
+	struct rpc_pipe_msg msg;
+	struct bl_dev_msg bl_umount_request;
+	struct bl_msg_hdr bl_msg = {
+		.type = BL_DEVICE_UMOUNT,
+		.totallen = sizeof(bl_umount_request),
+	};
+	uint8_t *dataptr;
+	DECLARE_WAITQUEUE(wq, current);
+
+	dprintk("Entering %s\n", __func__);
+
+	memset(&msg, 0, sizeof(msg));
+	msg.data = kzalloc(1 + sizeof(bl_umount_request), GFP_NOFS);
+	if (!msg.data)
+		goto out;
+
+	memset(&bl_umount_request, 0, sizeof(bl_umount_request));
+	bl_umount_request.major = MAJOR(dev);
+	bl_umount_request.minor = MINOR(dev);
+
+	memcpy(msg.data, &bl_msg, sizeof(bl_msg));
+	dataptr = (uint8_t *) msg.data;
+	memcpy(&dataptr[sizeof(bl_msg)], &bl_umount_request, sizeof(bl_umount_request));
+	msg.len = sizeof(bl_msg) + bl_msg.totallen;
+
+	add_wait_queue(&bl_wq, &wq);
+	if (rpc_queue_upcall(bl_device_pipe->d_inode, &msg) < 0) {
+		remove_wait_queue(&bl_wq, &wq);
+		goto out;
+	}
+
+	set_current_state(TASK_UNINTERRUPTIBLE);
+	schedule();
+	__set_current_state(TASK_RUNNING);
+	remove_wait_queue(&bl_wq, &wq);
+
+out:
+	kfree(msg.data);
+}
+
+/*
+ * Release meta device
+ */
+static void nfs4_blk_metadev_release(struct pnfs_block_dev *bdev)
+{
+	int rv;
+
+	dprintk("%s Releasing\n", __func__);
+	rv = nfs4_blkdev_put(bdev->bm_mdev);
+	if (rv)
+		printk(KERN_ERR "%s nfs4_blkdev_put returns %d\n",
+				__func__, rv);
+
+	dev_remove(bdev->bm_mdev->bd_dev);
+}
+
+void free_block_dev(struct pnfs_block_dev *bdev)
+{
+	if (bdev) {
+		if (bdev->bm_mdev) {
+			dprintk("%s Removing DM device: %d:%d\n",
+				__func__,
+				MAJOR(bdev->bm_mdev->bd_dev),
+				MINOR(bdev->bm_mdev->bd_dev));
+			nfs4_blk_metadev_release(bdev);
+		}
+		kfree(bdev);
+	}
+}
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 11/25] pnfsblock: lseg alloc and free
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (9 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 10/25] pnfsblock: remove " Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 18:40 ` [PATCH v3 12/25] pnfsblock: merge extents Jim Rees
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Fred Isaman <iisaman@citi.umich.edu>

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: fix bug getting pnfs_layout_type in translate_devid().]
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Zhang Jingwang <Jingwang.Zhang@emc.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
---
 fs/nfs/blocklayout/blocklayout.c    |   31 +++++++++++++++++++++++++------
 fs/nfs/blocklayout/blocklayout.h    |    6 ++++++
 fs/nfs/blocklayout/blocklayoutdev.c |    8 ++++++++
 3 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 3afe363..bec68ca 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -115,16 +115,35 @@ static struct pnfs_layout_hdr *bl_alloc_layout_hdr(struct inode *inode,
 	return &bl->bl_layout;
 }
 
-static void
-bl_free_lseg(struct pnfs_layout_segment *lseg)
+static void bl_free_lseg(struct pnfs_layout_segment *lseg)
 {
+	dprintk("%s enter\n", __func__);
+	kfree(lseg);
 }
 
-static struct pnfs_layout_segment *
-bl_alloc_lseg(struct pnfs_layout_hdr *lo,
-	      struct nfs4_layoutget_res *lgr, gfp_t gfp_flags)
+/* We pretty much ignore lseg, and store all data layout wide, so we
+ * can correctly merge.
+ */
+static struct pnfs_layout_segment *bl_alloc_lseg(struct pnfs_layout_hdr *lo,
+						 struct nfs4_layoutget_res *lgr,
+						 gfp_t gfp_flags)
 {
-	return NULL;
+	struct pnfs_layout_segment *lseg;
+	int status;
+
+	dprintk("%s enter\n", __func__);
+	lseg = kzalloc(sizeof(*lseg), gfp_flags);
+	if (!lseg)
+		return ERR_PTR(-ENOMEM);
+	status = nfs4_blk_process_layoutget(lo, lgr, gfp_flags);
+	if (status) {
+		/* We don't want to call the full-blown bl_free_lseg,
+		 * since on error extents were not touched.
+		 */
+		kfree(lseg);
+		return ERR_PTR(status);
+	}
+	return lseg;
 }
 
 static void
diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
index 9b88918..744c7a5 100644
--- a/fs/nfs/blocklayout/blocklayout.h
+++ b/fs/nfs/blocklayout/blocklayout.h
@@ -96,6 +96,12 @@ static inline struct pnfs_block_layout *BLK_LO2EXT(struct pnfs_layout_hdr *lo)
 	return container_of(lo, struct pnfs_block_layout, bl_layout);
 }
 
+static inline struct pnfs_block_layout *
+BLK_LSEG2EXT(struct pnfs_layout_segment *lseg)
+{
+	return BLK_LO2EXT(lseg->pls_layout);
+}
+
 struct bl_dev_msg {
 	int status;
 	uint32_t major, minor;
diff --git a/fs/nfs/blocklayout/blocklayoutdev.c b/fs/nfs/blocklayout/blocklayoutdev.c
index 7e1377f..64da33a 100644
--- a/fs/nfs/blocklayout/blocklayoutdev.c
+++ b/fs/nfs/blocklayout/blocklayoutdev.c
@@ -189,3 +189,11 @@ out:
 	kfree(msg.data);
 	return rv;
 }
+
+int
+nfs4_blk_process_layoutget(struct pnfs_layout_hdr *lo,
+			   struct nfs4_layoutget_res *lgr, gfp_t gfp_flags)
+{
+	/* STUB */
+	return -EIO;
+}
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 12/25] pnfsblock: merge extents
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (10 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 11/25] pnfsblock: lseg alloc and free Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 18:40 ` [PATCH v3 13/25] pnfsblock: call and parse getdevicelist Jim Rees
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Fred Isaman <iisaman@citi.umich.edu>

Replace a stub, so that extents underlying the layouts are properly
added, merged, or ignored as necessary.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: delete the new node before put it]
Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
---
 fs/nfs/blocklayout/blocklayout.h |   13 +++++
 fs/nfs/blocklayout/extents.c     |  106 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 119 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
index 744c7a5..4411f77 100644
--- a/fs/nfs/blocklayout/blocklayout.h
+++ b/fs/nfs/blocklayout/blocklayout.h
@@ -80,6 +80,14 @@ enum extentclass4 {
 	EXTENT_LISTS    = 2,
 };
 
+static inline int choose_list(enum exstate4 state)
+{
+	if (state == PNFS_BLOCK_READ_DATA || state == PNFS_BLOCK_NONE_DATA)
+		return RO_EXTENT;
+	else
+		return RW_EXTENT;
+}
+
 struct pnfs_block_layout {
 	struct pnfs_layout_hdr bl_layout;
 	struct pnfs_inval_markings bl_inval; /* tracks INVAL->RW transition */
@@ -137,5 +145,10 @@ int nfs4_blk_process_layoutget(struct pnfs_layout_hdr *lo,
 /* blocklayoutdm.c */
 void free_block_dev(struct pnfs_block_dev *bdev);
 
+/* extents.c */
 void bl_put_extent(struct pnfs_block_extent *be);
+struct pnfs_block_extent *alloc_extent(void);
+int bl_add_merge_extent(struct pnfs_block_layout *bl,
+			 struct pnfs_block_extent *new);
+
 #endif /* FS_NFS_NFS4BLOCKLAYOUT_H */
diff --git a/fs/nfs/blocklayout/extents.c b/fs/nfs/blocklayout/extents.c
index 44c3364..3591084 100644
--- a/fs/nfs/blocklayout/extents.c
+++ b/fs/nfs/blocklayout/extents.c
@@ -95,3 +95,109 @@ void print_elist(struct list_head *list)
 	}
 	dprintk("****************\n");
 }
+
+static inline int
+extents_consistent(struct pnfs_block_extent *old, struct pnfs_block_extent *new)
+{
+	/* Note this assumes new->be_f_offset >= old->be_f_offset */
+	return (new->be_state == old->be_state) &&
+		((new->be_state == PNFS_BLOCK_NONE_DATA) ||
+		 ((new->be_v_offset - old->be_v_offset ==
+		   new->be_f_offset - old->be_f_offset) &&
+		  new->be_mdev == old->be_mdev));
+}
+
+/* Adds new to appropriate list in bl, modifying new and removing existing
+ * extents as appropriate to deal with overlaps.
+ *
+ * See bl_find_get_extent for list constraints.
+ *
+ * Refcount on new is already set.  If end up not using it, or error out,
+ * need to put the reference.
+ *
+ * bl->bl_ext_lock is held by caller.
+ */
+int
+bl_add_merge_extent(struct pnfs_block_layout *bl,
+		     struct pnfs_block_extent *new)
+{
+	struct pnfs_block_extent *be, *tmp;
+	sector_t end = new->be_f_offset + new->be_length;
+	struct list_head *list;
+
+	dprintk("%s enter with be=%p\n", __func__, new);
+	print_bl_extent(new);
+	list = &bl->bl_extents[choose_list(new->be_state)];
+	print_elist(list);
+
+	/* Scan for proper place to insert, extending new to the left
+	 * as much as possible.
+	 */
+	list_for_each_entry_safe(be, tmp, list, be_node) {
+		if (new->be_f_offset < be->be_f_offset)
+			break;
+		if (end <= be->be_f_offset + be->be_length) {
+			/* new is a subset of existing be*/
+			if (extents_consistent(be, new)) {
+				dprintk("%s: new is subset, ignoring\n",
+					__func__);
+				bl_put_extent(new);
+				return 0;
+			} else
+				goto out_err;
+		} else if (new->be_f_offset <=
+				be->be_f_offset + be->be_length) {
+			/* new overlaps or abuts existing be */
+			if (extents_consistent(be, new)) {
+				/* extend new to fully replace be */
+				new->be_length += new->be_f_offset -
+						  be->be_f_offset;
+				new->be_f_offset = be->be_f_offset;
+				new->be_v_offset = be->be_v_offset;
+				dprintk("%s: removing %p\n", __func__, be);
+				list_del(&be->be_node);
+				bl_put_extent(be);
+			} else if (new->be_f_offset !=
+				   be->be_f_offset + be->be_length)
+				goto out_err;
+		}
+	}
+	/* Note that if we never hit the above break, be will not point to a
+	 * valid extent.  However, in that case &be->be_node==list.
+	 */
+	list_add_tail(&new->be_node, &be->be_node);
+	dprintk("%s: inserting new\n", __func__);
+	print_elist(list);
+	/* Scan forward for overlaps.  If we find any, extend new and
+	 * remove the overlapped extent.
+	 */
+	be = list_prepare_entry(new, list, be_node);
+	list_for_each_entry_safe_continue(be, tmp, list, be_node) {
+		if (end < be->be_f_offset)
+			break;
+		/* new overlaps or abuts existing be */
+		if (extents_consistent(be, new)) {
+			if (end < be->be_f_offset + be->be_length) {
+				/* extend new to fully cover be */
+				end = be->be_f_offset + be->be_length;
+				new->be_length = end - new->be_f_offset;
+			}
+			dprintk("%s: removing %p\n", __func__, be);
+			list_del(&be->be_node);
+			bl_put_extent(be);
+		} else if (end != be->be_f_offset) {
+			list_del(&new->be_node);
+			goto out_err;
+		}
+	}
+	dprintk("%s: after merging\n", __func__);
+	print_elist(list);
+	/* FIXME - The per-list consistency checks have all been done,
+	 * should now check cross-list consistency.
+	 */
+	return 0;
+
+ out_err:
+	bl_put_extent(new);
+	return -EIO;
+}
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 13/25] pnfsblock: call and parse getdevicelist
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (11 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 12/25] pnfsblock: merge extents Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 18:40 ` [PATCH v3 14/25] pnfsblock: xdr decode pnfs_block_layout4 Jim Rees
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Fred Isaman <iisaman@citi.umich.edu>

Call GETDEVICELIST during mount, then call and parse GETDEVICEINFO
for each device returned.

[pnfsblock: get rid of deprecated xdr macros]
Signed-off-by: Jim Rees <rees@umich.edu>
[pnfsblock: fix pnfs_deviceid references]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: fix print format warnings for sector_t and size_t]
[pnfs-block: #include <linux/vmalloc.h>]
[pnfsblock: no PNFS_NFS_SERVER]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfsblock: fix bug determining size of striped volume]
[pnfsblock: fix oops when using multiple devices]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[pnfsblock: get rid of vmap and deviceid->area structure]
Signed-off-by: Peng Tao <peng_tao@emc.com>
---
 fs/nfs/blocklayout/blocklayout.c    |  138 ++++++++++++++++++++++++++++++++++-
 fs/nfs/blocklayout/blocklayout.h    |   13 +++-
 fs/nfs/blocklayout/blocklayoutdev.c |   13 +++-
 fs/nfs/pnfs.h                       |    1 -
 include/linux/nfs_fs_sb.h           |    1 +
 5 files changed, 158 insertions(+), 8 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index bec68ca..667bc50 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -158,17 +158,153 @@ bl_cleanup_layoutcommit(struct pnfs_layout_hdr *lo,
 {
 }
 
+static void free_blk_mountid(struct block_mount_id *mid)
+{
+	if (mid) {
+		struct pnfs_block_dev *dev;
+		spin_lock(&mid->bm_lock);
+		while (!list_empty(&mid->bm_devlist)) {
+			dev = list_first_entry(&mid->bm_devlist,
+					       struct pnfs_block_dev,
+					       bm_node);
+			list_del(&dev->bm_node);
+			free_block_dev(dev);
+		}
+		spin_unlock(&mid->bm_lock);
+		kfree(mid);
+	}
+}
+
+/* This is mostly copied from the filelayout's get_device_info function.
+ * It seems much of this should be at the generic pnfs level.
+ */
+static struct pnfs_block_dev *
+nfs4_blk_get_deviceinfo(struct nfs_server *server, const struct nfs_fh *fh,
+			struct nfs4_deviceid *d_id)
+{
+	struct pnfs_device *dev;
+	struct pnfs_block_dev *rv = NULL;
+	u32 max_resp_sz;
+	int max_pages;
+	struct page **pages = NULL;
+	int i, rc;
+
+	/*
+	 * Use the session max response size as the basis for setting
+	 * GETDEVICEINFO's maxcount
+	 */
+	max_resp_sz = server->nfs_client->cl_session->fc_attrs.max_resp_sz;
+	max_pages = max_resp_sz >> PAGE_SHIFT;
+	dprintk("%s max_resp_sz %u max_pages %d\n",
+		__func__, max_resp_sz, max_pages);
+
+	dev = kmalloc(sizeof(*dev), GFP_NOFS);
+	if (!dev) {
+		dprintk("%s kmalloc failed\n", __func__);
+		return NULL;
+	}
+
+	pages = kzalloc(max_pages * sizeof(struct page *), GFP_NOFS);
+	if (pages == NULL) {
+		kfree(dev);
+		return NULL;
+	}
+	for (i = 0; i < max_pages; i++) {
+		pages[i] = alloc_page(GFP_NOFS);
+		if (!pages[i])
+			goto out_free;
+	}
+
+	memcpy(&dev->dev_id, d_id, sizeof(*d_id));
+	dev->layout_type = LAYOUT_BLOCK_VOLUME;
+	dev->pages = pages;
+	dev->pgbase = 0;
+	dev->pglen = PAGE_SIZE * max_pages;
+	dev->mincount = 0;
+
+	dprintk("%s: dev_id: %s\n", __func__, dev->dev_id.data);
+	rc = nfs4_proc_getdeviceinfo(server, dev);
+	dprintk("%s getdevice info returns %d\n", __func__, rc);
+	if (rc)
+		goto out_free;
+
+	rv = nfs4_blk_decode_device(server, dev);
+ out_free:
+	for (i = 0; i < max_pages; i++)
+		__free_page(pages[i]);
+	kfree(pages);
+	kfree(dev);
+	return rv;
+}
+
 static int
 bl_set_layoutdriver(struct nfs_server *server, const struct nfs_fh *fh)
 {
+	struct block_mount_id *b_mt_id = NULL;
+	struct pnfs_devicelist *dlist = NULL;
+	struct pnfs_block_dev *bdev;
+	LIST_HEAD(block_disklist);
+	int status = 0, i;
+
 	dprintk("%s enter\n", __func__);
-	return 0;
+
+	if (server->pnfs_blksize == 0) {
+		dprintk("%s Server did not return blksize\n", __func__);
+		return -EINVAL;
+	}
+	b_mt_id = kzalloc(sizeof(struct block_mount_id), GFP_NOFS);
+	if (!b_mt_id) {
+		status = -ENOMEM;
+		goto out_error;
+	}
+	/* Initialize nfs4 block layout mount id */
+	spin_lock_init(&b_mt_id->bm_lock);
+	INIT_LIST_HEAD(&b_mt_id->bm_devlist);
+
+	dlist = kmalloc(sizeof(struct pnfs_devicelist), GFP_NOFS);
+	if (!dlist) {
+		status = -ENOMEM;
+		goto out_error;
+	}
+	dlist->eof = 0;
+	while (!dlist->eof) {
+		status = nfs4_proc_getdevicelist(server, fh, dlist);
+		if (status)
+			goto out_error;
+		dprintk("%s GETDEVICELIST numdevs=%i, eof=%i\n",
+			__func__, dlist->num_devs, dlist->eof);
+		for (i = 0; i < dlist->num_devs; i++) {
+			bdev = nfs4_blk_get_deviceinfo(server, fh,
+						       &dlist->dev_id[i]);
+			if (!bdev) {
+				status = -ENODEV;
+				goto out_error;
+			}
+			spin_lock(&b_mt_id->bm_lock);
+			list_add(&bdev->bm_node, &b_mt_id->bm_devlist);
+			spin_unlock(&b_mt_id->bm_lock);
+		}
+	}
+	dprintk("%s SUCCESS\n", __func__);
+	server->pnfs_ld_data = b_mt_id;
+
+ out_return:
+	kfree(dlist);
+	return status;
+
+ out_error:
+	free_blk_mountid(b_mt_id);
+	goto out_return;
 }
 
 static int
 bl_clear_layoutdriver(struct nfs_server *server)
 {
+	struct block_mount_id *b_mt_id = server->pnfs_ld_data;
+
 	dprintk("%s enter\n", __func__);
+	free_blk_mountid(b_mt_id);
+	dprintk("%s RETURNS\n", __func__);
 	return 0;
 }
 
diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
index 4411f77..3105b96 100644
--- a/fs/nfs/blocklayout/blocklayout.h
+++ b/fs/nfs/blocklayout/blocklayout.h
@@ -38,6 +38,11 @@
 
 #include "../pnfs.h"
 
+struct block_mount_id {
+	spinlock_t			bm_lock;    /* protects list */
+	struct list_head		bm_devlist; /* holds pnfs_block_dev */
+};
+
 struct pnfs_block_dev {
 	struct list_head		bm_node;
 	struct nfs4_deviceid		bm_mdevid;    /* associated devid */
@@ -99,7 +104,10 @@ struct pnfs_block_layout {
 	sector_t		bl_blocksize;  /* Server blocksize in sectors */
 };
 
-static inline struct pnfs_block_layout *BLK_LO2EXT(struct pnfs_layout_hdr *lo)
+#define BLK_ID(lo) ((struct block_mount_id *)(NFS_SERVER(lo->plh_inode)->pnfs_ld_data))
+
+static inline struct pnfs_block_layout *
+BLK_LO2EXT(struct pnfs_layout_hdr *lo)
 {
 	return container_of(lo, struct pnfs_block_layout, bl_layout);
 }
@@ -137,8 +145,7 @@ void bl_pipe_destroy_msg(struct rpc_pipe_msg *);
 struct block_device *nfs4_blkdev_get(dev_t dev);
 int nfs4_blkdev_put(struct block_device *bdev);
 struct pnfs_block_dev *nfs4_blk_decode_device(struct nfs_server *server,
-						struct pnfs_device *dev,
-						struct list_head *sdlist);
+						struct pnfs_device *dev);
 int nfs4_blk_process_layoutget(struct pnfs_layout_hdr *lo,
 				struct nfs4_layoutget_res *lgr, gfp_t gfp_flags);
 
diff --git a/fs/nfs/blocklayout/blocklayoutdev.c b/fs/nfs/blocklayout/blocklayoutdev.c
index 64da33a..b23fe60 100644
--- a/fs/nfs/blocklayout/blocklayoutdev.c
+++ b/fs/nfs/blocklayout/blocklayoutdev.c
@@ -116,8 +116,7 @@ void bl_pipe_destroy_msg(struct rpc_pipe_msg *msg)
  */
 struct pnfs_block_dev *
 nfs4_blk_decode_device(struct nfs_server *server,
-		       struct pnfs_device *dev,
-		       struct list_head *sdlist)
+		       struct pnfs_device *dev)
 {
 	struct pnfs_block_dev *rv = NULL;
 	struct block_device *bd = NULL;
@@ -129,6 +128,7 @@ nfs4_blk_decode_device(struct nfs_server *server,
 	uint8_t *dataptr;
 	DECLARE_WAITQUEUE(wq, current);
 	struct bl_dev_msg *reply = &bl_mount_reply;
+	int offset, len, i;
 
 	dprintk("%s CREATING PIPEFS MESSAGE\n", __func__);
 	dprintk("%s: deviceid: %s, mincount: %d\n", __func__, dev->dev_id.data,
@@ -143,7 +143,14 @@ nfs4_blk_decode_device(struct nfs_server *server,
 
 	memcpy(msg.data, &bl_msg, sizeof(bl_msg));
 	dataptr = (uint8_t *) msg.data;
-	memcpy(&dataptr[sizeof(bl_msg)], dev->area, dev->mincount);
+	len = dev->mincount;
+	offset = sizeof(bl_msg);
+	for (i = 0; len > 0; i++) {
+		memcpy(&dataptr[offset], page_address(dev->pages[i]),
+				len < PAGE_CACHE_SIZE ? len : PAGE_CACHE_SIZE);
+		len -= PAGE_CACHE_SIZE;
+		offset += PAGE_CACHE_SIZE;
+	}
 	msg.len = sizeof(bl_msg) + dev->mincount;
 
 	dprintk("%s CALLING USERSPACE DAEMON\n", __func__);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index f271425..82dde37 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -141,7 +141,6 @@ struct pnfs_device {
 	unsigned int  layout_type;
 	unsigned int  mincount;
 	struct page **pages;
-	void          *area;
 	unsigned int  pgbase;
 	unsigned int  pglen;
 };
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index b2ea8b8..cc03fc1 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -146,6 +146,7 @@ struct nfs_server {
 	struct pnfs_layoutdriver_type  *pnfs_curr_ld; /* Active layout driver */
 	struct rpc_wait_queue	roc_rpcwaitq;
 	u32			pnfs_blksize;	/* layout_blksize attr */
+	void			*pnfs_ld_data;	/* per mount point data */
 
 	/* the following fields are protected by nfs_client->cl_lock */
 	struct rb_root		state_owners;
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 14/25] pnfsblock: xdr decode pnfs_block_layout4
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (12 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 13/25] pnfsblock: call and parse getdevicelist Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 18:40 ` [PATCH v3 15/25] pnfsblock: bl_find_get_extent Jim Rees
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Fred Isaman <iisaman@citi.umich.edu>

XDR decodes the block layout payload sent in LAYOUTGET result, storing
the result in an extent list.

[pnfsblock: get rid of deprecated xdr macros]
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: fix bug getting pnfs_layout_type in translate_devid().]
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
---
 fs/nfs/blocklayout/blocklayoutdev.c |  208 ++++++++++++++++++++++++++++++++++-
 1 files changed, 206 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayoutdev.c b/fs/nfs/blocklayout/blocklayoutdev.c
index b23fe60..3bf8358 100644
--- a/fs/nfs/blocklayout/blocklayoutdev.c
+++ b/fs/nfs/blocklayout/blocklayoutdev.c
@@ -40,6 +40,19 @@
 
 #define NFSDBG_FACILITY         NFSDBG_PNFS_LD
 
+static int decode_sector_number(__be32 **rp, sector_t *sp)
+{
+	uint64_t s;
+
+	*rp = xdr_decode_hyper(*rp, &s);
+	if (s & 0x1ff) {
+		printk(KERN_WARNING "%s: sector not aligned\n", __func__);
+		return -1;
+	}
+	*sp = s >> SECTOR_SHIFT;
+	return 0;
+}
+
 /* Open a block_device by device number. */
 struct block_device *nfs4_blkdev_get(dev_t dev)
 {
@@ -197,10 +210,201 @@ out:
 	return rv;
 }
 
+/* Map deviceid returned by the server to constructed block_device */
+static struct block_device *translate_devid(struct pnfs_layout_hdr *lo,
+					    struct nfs4_deviceid *id)
+{
+	struct block_device *rv = NULL;
+	struct block_mount_id *mid;
+	struct pnfs_block_dev *dev;
+
+	dprintk("%s enter, lo=%p, id=%p\n", __func__, lo, id);
+	mid = BLK_ID(lo);
+	spin_lock(&mid->bm_lock);
+	list_for_each_entry(dev, &mid->bm_devlist, bm_node) {
+		if (memcmp(id->data, dev->bm_mdevid.data,
+			   NFS4_DEVICEID4_SIZE) == 0) {
+			rv = dev->bm_mdev;
+			goto out;
+		}
+	}
+ out:
+	spin_unlock(&mid->bm_lock);
+	dprintk("%s returning %p\n", __func__, rv);
+	return rv;
+}
+
+/* Tracks info needed to ensure extents in layout obey constraints of spec */
+struct layout_verification {
+	u32 mode;	/* R or RW */
+	u64 start;	/* Expected start of next non-COW extent */
+	u64 inval;	/* Start of INVAL coverage */
+	u64 cowread;	/* End of COW read coverage */
+};
+
+/* Verify the extent meets the layout requirements of the pnfs-block draft,
+ * section 2.3.1.
+ */
+static int verify_extent(struct pnfs_block_extent *be,
+			 struct layout_verification *lv)
+{
+	if (lv->mode == IOMODE_READ) {
+		if (be->be_state == PNFS_BLOCK_READWRITE_DATA ||
+		    be->be_state == PNFS_BLOCK_INVALID_DATA)
+			return -EIO;
+		if (be->be_f_offset != lv->start)
+			return -EIO;
+		lv->start += be->be_length;
+		return 0;
+	}
+	/* lv->mode == IOMODE_RW */
+	if (be->be_state == PNFS_BLOCK_READWRITE_DATA) {
+		if (be->be_f_offset != lv->start)
+			return -EIO;
+		if (lv->cowread > lv->start)
+			return -EIO;
+		lv->start += be->be_length;
+		lv->inval = lv->start;
+		return 0;
+	} else if (be->be_state == PNFS_BLOCK_INVALID_DATA) {
+		if (be->be_f_offset != lv->start)
+			return -EIO;
+		lv->start += be->be_length;
+		return 0;
+	} else if (be->be_state == PNFS_BLOCK_READ_DATA) {
+		if (be->be_f_offset > lv->start)
+			return -EIO;
+		if (be->be_f_offset < lv->inval)
+			return -EIO;
+		if (be->be_f_offset < lv->cowread)
+			return -EIO;
+		/* It looks like you might want to min this with lv->start,
+		 * but you really don't.
+		 */
+		lv->inval = lv->inval + be->be_length;
+		lv->cowread = be->be_f_offset + be->be_length;
+		return 0;
+	} else
+		return -EIO;
+}
+
+/* XDR decode pnfs_block_layout4 structure */
 int
 nfs4_blk_process_layoutget(struct pnfs_layout_hdr *lo,
 			   struct nfs4_layoutget_res *lgr, gfp_t gfp_flags)
 {
-	/* STUB */
-	return -EIO;
+	struct pnfs_block_layout *bl = BLK_LO2EXT(lo);
+	int i, status = -EIO;
+	uint32_t count;
+	struct pnfs_block_extent *be = NULL, *save;
+	struct xdr_stream stream;
+	struct xdr_buf buf;
+	struct page *scratch;
+	__be32 *p;
+	struct layout_verification lv = {
+		.mode = lgr->range.iomode,
+		.start = lgr->range.offset >> SECTOR_SHIFT,
+		.inval = lgr->range.offset >> SECTOR_SHIFT,
+		.cowread = lgr->range.offset >> SECTOR_SHIFT,
+	};
+	LIST_HEAD(extents);
+
+	dprintk("---> %s\n", __func__);
+
+	scratch = alloc_page(gfp_flags);
+	if (!scratch)
+		return -ENOMEM;
+
+	xdr_init_decode_pages(&stream, &buf, lgr->layoutp->pages, lgr->layoutp->len);
+	xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE);
+
+	p = xdr_inline_decode(&stream, 4);
+	if (unlikely(!p))
+		goto out_err;
+
+	count = be32_to_cpup(p++);
+
+	dprintk("%s enter, number of extents %i\n", __func__, count);
+	p = xdr_inline_decode(&stream, (28 + NFS4_DEVICEID4_SIZE) * count);
+	if (unlikely(!p))
+		goto out_err;
+
+	/* Decode individual extents, putting them in temporary
+	 * staging area until whole layout is decoded to make error
+	 * recovery easier.
+	 */
+	for (i = 0; i < count; i++) {
+		be = alloc_extent();
+		if (!be) {
+			status = -ENOMEM;
+			goto out_err;
+		}
+		memcpy(&be->be_devid, p, NFS4_DEVICEID4_SIZE);
+		p += XDR_QUADLEN(NFS4_DEVICEID4_SIZE);
+		be->be_mdev = translate_devid(lo, &be->be_devid);
+		if (!be->be_mdev)
+			goto out_err;
+
+		/* The next three values are read in as bytes,
+		 * but stored as 512-byte sector lengths
+		 */
+		if (decode_sector_number(&p, &be->be_f_offset) < 0)
+			goto out_err;
+		if (decode_sector_number(&p, &be->be_length) < 0)
+			goto out_err;
+		if (decode_sector_number(&p, &be->be_v_offset) < 0)
+			goto out_err;
+		be->be_state = be32_to_cpup(p++);
+		if (be->be_state == PNFS_BLOCK_INVALID_DATA)
+			be->be_inval = &bl->bl_inval;
+		if (verify_extent(be, &lv)) {
+			dprintk("%s verify failed\n", __func__);
+			goto out_err;
+		}
+		list_add_tail(&be->be_node, &extents);
+	}
+	if (lgr->range.offset + lgr->range.length !=
+			lv.start << SECTOR_SHIFT) {
+		dprintk("%s Final length mismatch\n", __func__);
+		be = NULL;
+		goto out_err;
+	}
+	if (lv.start < lv.cowread) {
+		dprintk("%s Final uncovered COW extent\n", __func__);
+		be = NULL;
+		goto out_err;
+	}
+	/* Extents decoded properly, now try to merge them in to
+	 * existing layout extents.
+	 */
+	spin_lock(&bl->bl_ext_lock);
+	list_for_each_entry_safe(be, save, &extents, be_node) {
+		list_del(&be->be_node);
+		status = bl_add_merge_extent(bl, be);
+		if (status) {
+			spin_unlock(&bl->bl_ext_lock);
+			/* This is a fairly catastrophic error, as the
+			 * entire layout extent lists are now corrupted.
+			 * We should have some way to distinguish this.
+			 */
+			be = NULL;
+			goto out_err;
+		}
+	}
+	spin_unlock(&bl->bl_ext_lock);
+	status = 0;
+ out:
+	__free_page(scratch);
+	dprintk("%s returns %i\n", __func__, status);
+	return status;
+
+ out_err:
+	bl_put_extent(be);
+	while (!list_empty(&extents)) {
+		be = list_first_entry(&extents, struct pnfs_block_extent,
+				      be_node);
+		list_del(&be->be_node);
+		bl_put_extent(be);
+	}
+	goto out;
 }
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 15/25] pnfsblock: bl_find_get_extent
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (13 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 14/25] pnfsblock: xdr decode pnfs_block_layout4 Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 18:40 ` [PATCH v3 16/25] pnfsblock: add extent manipulation functions Jim Rees
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Fred Isaman <iisaman@citi.umich.edu>

Implement bl_find_get_extent(), one of the core extent manipulation
routines.

[pnfsblock: Lookup list entry of layouts and tags in reverse order]
Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>

pnfsblock: fix print format warnings for sector_t and size_t

gcc spews warnings about these on x86_64, e.g.:
fs/nfs/blocklayout/blocklayout.c:74: warning: format ‘%Lu’ expects type ‘long long unsigned int’, but argument 2 has type ‘sector_t’
fs/nfs/blocklayout/blocklayout.c:388: warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘size_t’

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
---
 fs/nfs/blocklayout/blocklayout.h |    3 ++
 fs/nfs/blocklayout/extents.c     |   47 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
index 3105b96..25c3153 100644
--- a/fs/nfs/blocklayout/blocklayout.h
+++ b/fs/nfs/blocklayout/blocklayout.h
@@ -153,6 +153,9 @@ int nfs4_blk_process_layoutget(struct pnfs_layout_hdr *lo,
 void free_block_dev(struct pnfs_block_dev *bdev);
 
 /* extents.c */
+struct pnfs_block_extent *
+bl_find_get_extent(struct pnfs_block_layout *bl, sector_t isect,
+		struct pnfs_block_extent **cow_read);
 void bl_put_extent(struct pnfs_block_extent *be);
 struct pnfs_block_extent *alloc_extent(void);
 int bl_add_merge_extent(struct pnfs_block_layout *bl,
diff --git a/fs/nfs/blocklayout/extents.c b/fs/nfs/blocklayout/extents.c
index 3591084..c306616 100644
--- a/fs/nfs/blocklayout/extents.c
+++ b/fs/nfs/blocklayout/extents.c
@@ -201,3 +201,50 @@ bl_add_merge_extent(struct pnfs_block_layout *bl,
 	bl_put_extent(new);
 	return -EIO;
 }
+
+/* Returns extent, or NULL.  If a second READ extent exists, it is returned
+ * in cow_read, if given.
+ *
+ * The extents are kept in two seperate ordered lists, one for READ and NONE,
+ * one for READWRITE and INVALID.  Within each list, we assume:
+ * 1. Extents are ordered by file offset.
+ * 2. For any given isect, there is at most one extents that matches.
+ */
+struct pnfs_block_extent *
+bl_find_get_extent(struct pnfs_block_layout *bl, sector_t isect,
+	    struct pnfs_block_extent **cow_read)
+{
+	struct pnfs_block_extent *be, *cow, *ret;
+	int i;
+
+	dprintk("%s enter with isect %llu\n", __func__, (u64)isect);
+	cow = ret = NULL;
+	spin_lock(&bl->bl_ext_lock);
+	for (i = 0; i < EXTENT_LISTS; i++) {
+		list_for_each_entry_reverse(be, &bl->bl_extents[i], be_node) {
+			if (isect >= be->be_f_offset + be->be_length)
+				break;
+			if (isect >= be->be_f_offset) {
+				/* We have found an extent */
+				dprintk("%s Get %p (%i)\n", __func__, be,
+					atomic_read(&be->be_refcnt.refcount));
+				kref_get(&be->be_refcnt);
+				if (!ret)
+					ret = be;
+				else if (be->be_state != PNFS_BLOCK_READ_DATA)
+					bl_put_extent(be);
+				else
+					cow = be;
+				break;
+			}
+		}
+		if (ret &&
+		    (!cow_read || ret->be_state != PNFS_BLOCK_INVALID_DATA))
+			break;
+	}
+	spin_unlock(&bl->bl_ext_lock);
+	if (cow_read)
+		*cow_read = cow;
+	print_bl_extent(ret);
+	return ret;
+}
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 16/25] pnfsblock: add extent manipulation functions
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (14 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 15/25] pnfsblock: bl_find_get_extent Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 18:40 ` [PATCH v3 17/25] pnfsblock: merge rw extents Jim Rees
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Fred Isaman <iisaman@citi.umich.edu>

Adds working implementations of various support functions
to handle INVAL extents, needed by writes, such as
bl_mark_sectors_init and is_sector_initialized.

[pnfsblock: fix 64-bit compiler warnings for extent manipulation]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[Implement release_inval_marks]
Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn>
---
 fs/nfs/blocklayout/blocklayout.c |    7 +-
 fs/nfs/blocklayout/blocklayout.h |   31 +++++-
 fs/nfs/blocklayout/extents.c     |  253 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 288 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 667bc50..eb18b28 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -78,10 +78,15 @@ release_extents(struct pnfs_block_layout *bl, struct pnfs_layout_range *range)
 	spin_unlock(&bl->bl_ext_lock);
 }
 
-/* STUB */
 static void
 release_inval_marks(struct pnfs_inval_markings *marks)
 {
+	struct pnfs_inval_tracking *pos, *temp;
+
+	list_for_each_entry_safe(pos, temp, &marks->im_tree.mtt_stub, it_link) {
+		list_del(&pos->it_link);
+		kfree(pos);
+	}
 	return;
 }
 
diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
index 25c3153..c002aa2 100644
--- a/fs/nfs/blocklayout/blocklayout.h
+++ b/fs/nfs/blocklayout/blocklayout.h
@@ -38,6 +38,9 @@
 
 #include "../pnfs.h"
 
+#define PAGE_CACHE_SECTORS (PAGE_CACHE_SIZE >> SECTOR_SHIFT)
+#define PAGE_CACHE_SECTOR_SHIFT (PAGE_CACHE_SHIFT - SECTOR_SHIFT)
+
 struct block_mount_id {
 	spinlock_t			bm_lock;    /* protects list */
 	struct list_head		bm_devlist; /* holds pnfs_block_dev */
@@ -56,8 +59,23 @@ enum exstate4 {
 	PNFS_BLOCK_NONE_DATA		= 3  /* unmapped, it's a hole */
 };
 
+#define MY_MAX_TAGS (15) /* tag bitnums used must be less than this */
+
+struct my_tree {
+	sector_t		mtt_step_size;	/* Internal sector alignment */
+	struct list_head	mtt_stub; /* Should be a radix tree */
+};
+
 struct pnfs_inval_markings {
-	/* STUB */
+	spinlock_t	im_lock;
+	struct my_tree	im_tree;	/* Sectors that need LAYOUTCOMMIT */
+	sector_t	im_block_size;	/* Server blocksize in sectors */
+};
+
+struct pnfs_inval_tracking {
+	struct list_head it_link;
+	int		 it_sector;
+	int		 it_tags;
 };
 
 /* sector_t fields are all in 512-byte sectors */
@@ -76,7 +94,11 @@ struct pnfs_block_extent {
 static inline void
 INIT_INVAL_MARKS(struct pnfs_inval_markings *marks, sector_t blocksize)
 {
-	/* STUB */
+	spin_lock_init(&marks->im_lock);
+	INIT_LIST_HEAD(&marks->im_tree.mtt_stub);
+	marks->im_block_size = blocksize;
+	marks->im_tree.mtt_step_size = min((sector_t)PAGE_CACHE_SECTORS,
+					   blocksize);
 }
 
 enum extentclass4 {
@@ -156,8 +178,13 @@ void free_block_dev(struct pnfs_block_dev *bdev);
 struct pnfs_block_extent *
 bl_find_get_extent(struct pnfs_block_layout *bl, sector_t isect,
 		struct pnfs_block_extent **cow_read);
+int bl_mark_sectors_init(struct pnfs_inval_markings *marks,
+			     sector_t offset, sector_t length,
+			     sector_t **pages);
 void bl_put_extent(struct pnfs_block_extent *be);
 struct pnfs_block_extent *alloc_extent(void);
+struct pnfs_block_extent *get_extent(struct pnfs_block_extent *be);
+int is_sector_initialized(struct pnfs_inval_markings *marks, sector_t isect);
 int bl_add_merge_extent(struct pnfs_block_layout *bl,
 			 struct pnfs_block_extent *new);
 
diff --git a/fs/nfs/blocklayout/extents.c b/fs/nfs/blocklayout/extents.c
index c306616..3528d36 100644
--- a/fs/nfs/blocklayout/extents.c
+++ b/fs/nfs/blocklayout/extents.c
@@ -33,6 +33,259 @@
 #include "blocklayout.h"
 #define NFSDBG_FACILITY         NFSDBG_PNFS_LD
 
+/* Bit numbers */
+#define EXTENT_INITIALIZED 0
+#define EXTENT_WRITTEN     1
+#define EXTENT_IN_COMMIT   2
+#define INTERNAL_EXISTS    MY_MAX_TAGS
+#define INTERNAL_MASK      ((1 << INTERNAL_EXISTS) - 1)
+
+/* Returns largest t<=s s.t. t%base==0 */
+static inline sector_t normalize(sector_t s, int base)
+{
+	sector_t tmp = s; /* Since do_div modifies its argument */
+	return s - do_div(tmp, base);
+}
+
+static inline sector_t normalize_up(sector_t s, int base)
+{
+	return normalize(s + base - 1, base);
+}
+
+/* Complete stub using list while determine API wanted */
+
+/* Returns tags, or negative */
+static int32_t _find_entry(struct my_tree *tree, u64 s)
+{
+	struct pnfs_inval_tracking *pos;
+
+	dprintk("%s(%llu) enter\n", __func__, s);
+	list_for_each_entry_reverse(pos, &tree->mtt_stub, it_link) {
+		if (pos->it_sector > s)
+			continue;
+		else if (pos->it_sector == s)
+			return pos->it_tags & INTERNAL_MASK;
+		else
+			break;
+	}
+	return -ENOENT;
+}
+
+static inline
+int _has_tag(struct my_tree *tree, u64 s, int32_t tag)
+{
+	int32_t tags;
+
+	dprintk("%s(%llu, %i) enter\n", __func__, s, tag);
+	s = normalize(s, tree->mtt_step_size);
+	tags = _find_entry(tree, s);
+	if ((tags < 0) || !(tags & (1 << tag)))
+		return 0;
+	else
+		return 1;
+}
+
+/* Creates entry with tag, or if entry already exists, unions tag to it.
+ * If storage is not NULL, newly created entry will use it.
+ * Returns number of entries added, or negative on error.
+ */
+static int _add_entry(struct my_tree *tree, u64 s, int32_t tag,
+		      struct pnfs_inval_tracking *storage)
+{
+	int found = 0;
+	struct pnfs_inval_tracking *pos;
+
+	dprintk("%s(%llu, %i, %p) enter\n", __func__, s, tag, storage);
+	list_for_each_entry_reverse(pos, &tree->mtt_stub, it_link) {
+		if (pos->it_sector > s)
+			continue;
+		else if (pos->it_sector == s) {
+			found = 1;
+			break;
+		} else
+			break;
+	}
+	if (found) {
+		pos->it_tags |= (1 << tag);
+		return 0;
+	} else {
+		struct pnfs_inval_tracking *new;
+		if (storage)
+			new = storage;
+		else {
+			new = kmalloc(sizeof(*new), GFP_NOFS);
+			if (!new)
+				return -ENOMEM;
+		}
+		new->it_sector = s;
+		new->it_tags = (1 << tag);
+		list_add(&new->it_link, &pos->it_link);
+		return 1;
+	}
+}
+
+/* XXXX Really want option to not create */
+/* Over range, unions tag with existing entries, else creates entry with tag */
+static int _set_range(struct my_tree *tree, int32_t tag, u64 s, u64 length)
+{
+	u64 i;
+
+	dprintk("%s(%i, %llu, %llu) enter\n", __func__, tag, s, length);
+	for (i = normalize(s, tree->mtt_step_size); i < s + length;
+	     i += tree->mtt_step_size)
+		if (_add_entry(tree, i, tag, NULL))
+			return -ENOMEM;
+	return 0;
+}
+
+/* Ensure that future operations on given range of tree will not malloc */
+static int _preload_range(struct my_tree *tree, u64 offset, u64 length)
+{
+	u64 start, end, s;
+	int count, i, used = 0, status = -ENOMEM;
+	struct pnfs_inval_tracking **storage;
+
+	dprintk("%s(%llu, %llu) enter\n", __func__, offset, length);
+	start = normalize(offset, tree->mtt_step_size);
+	end = normalize_up(offset + length, tree->mtt_step_size);
+	count = (int)(end - start) / (int)tree->mtt_step_size;
+
+	/* Pre-malloc what memory we might need */
+	storage = kmalloc(sizeof(*storage) * count, GFP_NOFS);
+	if (!storage)
+		return -ENOMEM;
+	for (i = 0; i < count; i++) {
+		storage[i] = kmalloc(sizeof(struct pnfs_inval_tracking),
+				     GFP_NOFS);
+		if (!storage[i])
+			goto out_cleanup;
+	}
+
+	/* Now need lock - HOW??? */
+
+	for (s = start; s < end; s += tree->mtt_step_size)
+		used += _add_entry(tree, s, INTERNAL_EXISTS, storage[used]);
+
+	/* Unlock - HOW??? */
+	status = 0;
+
+ out_cleanup:
+	for (i = used; i < count; i++) {
+		if (!storage[i])
+			break;
+		kfree(storage[i]);
+	}
+	kfree(storage);
+	return status;
+}
+
+static void set_needs_init(sector_t *array, sector_t offset)
+{
+	sector_t *p = array;
+
+	dprintk("%s enter\n", __func__);
+	if (!p)
+		return;
+	while (*p < offset)
+		p++;
+	if (*p == offset)
+		return;
+	else if (*p == ~0) {
+		*p++ = offset;
+		*p = ~0;
+		return;
+	} else {
+		sector_t *save = p;
+		dprintk("%s Adding %llu\n", __func__, (u64)offset);
+		while (*p != ~0)
+			p++;
+		p++;
+		memmove(save + 1, save, (char *)p - (char *)save);
+		*save = offset;
+		return;
+	}
+}
+
+/* We are relying on page lock to serialize this */
+int is_sector_initialized(struct pnfs_inval_markings *marks, sector_t isect)
+{
+	int rv;
+
+	spin_lock(&marks->im_lock);
+	rv = _has_tag(&marks->im_tree, isect, EXTENT_INITIALIZED);
+	spin_unlock(&marks->im_lock);
+	return rv;
+}
+
+/* Marks sectors in [offest, offset_length) as having been initialized.
+ * All lengths are step-aligned, where step is min(pagesize, blocksize).
+ * Notes where partial block is initialized, and helps prepare it for
+ * complete initialization later.
+ */
+/* Currently assumes offset is page-aligned */
+int bl_mark_sectors_init(struct pnfs_inval_markings *marks,
+			     sector_t offset, sector_t length,
+			     sector_t **pages)
+{
+	sector_t s, start, end;
+	sector_t *array = NULL; /* Pages to mark */
+
+	dprintk("%s(offset=%llu,len=%llu) enter\n",
+		__func__, (u64)offset, (u64)length);
+	s = max((sector_t) 3,
+		2 * (marks->im_block_size / (PAGE_CACHE_SECTORS)));
+	dprintk("%s set max=%llu\n", __func__, (u64)s);
+	if (pages) {
+		array = kmalloc(s * sizeof(sector_t), GFP_NOFS);
+		if (!array)
+			goto outerr;
+		array[0] = ~0;
+	}
+
+	start = normalize(offset, marks->im_block_size);
+	end = normalize_up(offset + length, marks->im_block_size);
+	if (_preload_range(&marks->im_tree, start, end - start))
+		goto outerr;
+
+	spin_lock(&marks->im_lock);
+
+	for (s = normalize_up(start, PAGE_CACHE_SECTORS);
+	     s < offset; s += PAGE_CACHE_SECTORS) {
+		dprintk("%s pre-area pages\n", __func__);
+		/* Portion of used block is not initialized */
+		if (!_has_tag(&marks->im_tree, s, EXTENT_INITIALIZED))
+			set_needs_init(array, s);
+	}
+	if (_set_range(&marks->im_tree, EXTENT_INITIALIZED, offset, length))
+		goto out_unlock;
+	for (s = normalize_up(offset + length, PAGE_CACHE_SECTORS);
+	     s < end; s += PAGE_CACHE_SECTORS) {
+		dprintk("%s post-area pages\n", __func__);
+		if (!_has_tag(&marks->im_tree, s, EXTENT_INITIALIZED))
+			set_needs_init(array, s);
+	}
+
+	spin_unlock(&marks->im_lock);
+
+	if (pages) {
+		if (array[0] == ~0) {
+			kfree(array);
+			*pages = NULL;
+		} else
+			*pages = array;
+	}
+	return 0;
+
+ out_unlock:
+	spin_unlock(&marks->im_lock);
+ outerr:
+	if (pages) {
+		kfree(array);
+		*pages = NULL;
+	}
+	return -ENOMEM;
+}
+
 static void print_bl_extent(struct pnfs_block_extent *be)
 {
 	dprintk("PRINT EXTENT extent %p\n", be);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 17/25] pnfsblock: merge rw extents
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (15 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 16/25] pnfsblock: add extent manipulation functions Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 18:40 ` [PATCH v3 18/25] pnfsblock: encode_layoutcommit Jim Rees
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Fred Isaman <iisaman@citi.umich.edu>

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
---
 fs/nfs/blocklayout/extents.c |   47 ++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/blocklayout/extents.c b/fs/nfs/blocklayout/extents.c
index 3528d36..7beae7c 100644
--- a/fs/nfs/blocklayout/extents.c
+++ b/fs/nfs/blocklayout/extents.c
@@ -501,3 +501,50 @@ bl_find_get_extent(struct pnfs_block_layout *bl, sector_t isect,
 	print_bl_extent(ret);
 	return ret;
 }
+
+/* Helper function to set_to_rw that initialize a new extent */
+static void
+_prep_new_extent(struct pnfs_block_extent *new,
+		 struct pnfs_block_extent *orig,
+		 sector_t offset, sector_t length, int state)
+{
+	kref_init(&new->be_refcnt);
+	/* don't need to INIT_LIST_HEAD(&new->be_node) */
+	memcpy(&new->be_devid, &orig->be_devid, sizeof(struct nfs4_deviceid));
+	new->be_mdev = orig->be_mdev;
+	new->be_f_offset = offset;
+	new->be_length = length;
+	new->be_v_offset = orig->be_v_offset - orig->be_f_offset + offset;
+	new->be_state = state;
+	new->be_inval = orig->be_inval;
+}
+
+/* Tries to merge be with extent in front of it in list.
+ * Frees storage if not used.
+ */
+static struct pnfs_block_extent *
+_front_merge(struct pnfs_block_extent *be, struct list_head *head,
+	     struct pnfs_block_extent *storage)
+{
+	struct pnfs_block_extent *prev;
+
+	if (!storage)
+		goto no_merge;
+	if (&be->be_node == head || be->be_node.prev == head)
+		goto no_merge;
+	prev = list_entry(be->be_node.prev, struct pnfs_block_extent, be_node);
+	if ((prev->be_f_offset + prev->be_length != be->be_f_offset) ||
+	    !extents_consistent(prev, be))
+		goto no_merge;
+	_prep_new_extent(storage, prev, prev->be_f_offset,
+			 prev->be_length + be->be_length, prev->be_state);
+	list_replace(&prev->be_node, &storage->be_node);
+	bl_put_extent(prev);
+	list_del(&be->be_node);
+	bl_put_extent(be);
+	return storage;
+
+ no_merge:
+	kfree(storage);
+	return be;
+}
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 18/25] pnfsblock: encode_layoutcommit
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (16 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 17/25] pnfsblock: merge rw extents Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 18:40 ` [PATCH v3 19/25] pnfsblock: cleanup_layoutcommit Jim Rees
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Fred Isaman <iisaman@citi.umich.edu>

In blocklayout driver. There are two things happening
while layoutcommit/cleanup.
1. the modified extents are encoded.
2. On cleanup the extents are put back on the layout rw
   extents list, for reads.

In the new system where actual xdr encoding is done in
encode_layoutcommit() directly into xdr buffer, these are
the new commit stages:

1. On setup_layoutcommit, the range is adjusted as before
   and a structure is allocated for communication with
   bl_encode_layoutcommit && bl_cleanup_layoutcommit
   (Generic layer provides a void-star to hang it on)

2. bl_encode_layoutcommit is called to do the actual
   encoding directly into xdr. The commit-extent-list is not
   freed and is stored on above structure.
   FIXME: The code is not yet converted to the new XDR cleanup

3. On cleanup the commit-extent-list is put back by a call
   to set_to_rw() as before, but with no need for XDR decoding
   of the list as before. And the commit-extent-list is freed.
   Finally allocated structure is freed.

[pnfsblock: get rid of deprecated xdr macros]
Signed-off-by: Jim Rees <rees@umich.edu>
[pnfsblock: SQUASHME: port block layout code]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[blocklayout: encode_layoutcommit implementation]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[pnfsblock: fix bug setting up layoutcommit.]
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
[pnfsblock: prevent commit list corruption]
[pnfsblock: fix layoutcommit with an empty opaque]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
---
 fs/nfs/blocklayout/blocklayout.c |    2 +
 fs/nfs/blocklayout/blocklayout.h |   12 +++
 fs/nfs/blocklayout/extents.c     |  176 ++++++++++++++++++++++++++++----------
 3 files changed, 146 insertions(+), 44 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index eb18b28..0c05f8a 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -155,6 +155,8 @@ static void
 bl_encode_layoutcommit(struct pnfs_layout_hdr *lo, struct xdr_stream *xdr,
 		       const struct nfs4_layoutcommit_args *arg)
 {
+	dprintk("%s enter\n", __func__);
+	encode_pnfs_block_layoutupdate(BLK_LO2EXT(lo), xdr, arg);
 }
 
 static void
diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
index c002aa2..de908da 100644
--- a/fs/nfs/blocklayout/blocklayout.h
+++ b/fs/nfs/blocklayout/blocklayout.h
@@ -91,6 +91,15 @@ struct pnfs_block_extent {
 	struct pnfs_inval_markings *be_inval; /* tracks INVAL->RW transition */
 };
 
+/* Shortened extent used by LAYOUTCOMMIT */
+struct pnfs_block_short_extent {
+	struct list_head bse_node;
+	struct nfs4_deviceid bse_devid;
+	struct block_device *bse_mdev;
+	sector_t	bse_f_offset;	/* the starting offset in the file */
+	sector_t	bse_length;	/* the size of the extent */
+};
+
 static inline void
 INIT_INVAL_MARKS(struct pnfs_inval_markings *marks, sector_t blocksize)
 {
@@ -185,6 +194,9 @@ void bl_put_extent(struct pnfs_block_extent *be);
 struct pnfs_block_extent *alloc_extent(void);
 struct pnfs_block_extent *get_extent(struct pnfs_block_extent *be);
 int is_sector_initialized(struct pnfs_inval_markings *marks, sector_t isect);
+int encode_pnfs_block_layoutupdate(struct pnfs_block_layout *bl,
+				   struct xdr_stream *xdr,
+				   const struct nfs4_layoutcommit_args *arg);
 int bl_add_merge_extent(struct pnfs_block_layout *bl,
 			 struct pnfs_block_extent *new);
 
diff --git a/fs/nfs/blocklayout/extents.c b/fs/nfs/blocklayout/extents.c
index 7beae7c..b46c8be 100644
--- a/fs/nfs/blocklayout/extents.c
+++ b/fs/nfs/blocklayout/extents.c
@@ -286,6 +286,49 @@ int bl_mark_sectors_init(struct pnfs_inval_markings *marks,
 	return -ENOMEM;
 }
 
+/* Marks sectors in [offest, offset+length) as having been written to disk.
+ * All lengths should be block aligned.
+ */
+int mark_written_sectors(struct pnfs_inval_markings *marks,
+			 sector_t offset, sector_t length)
+{
+	int status;
+
+	dprintk("%s(offset=%llu,len=%llu) enter\n", __func__,
+		(u64)offset, (u64)length);
+	spin_lock(&marks->im_lock);
+	status = _set_range(&marks->im_tree, EXTENT_WRITTEN, offset, length);
+	spin_unlock(&marks->im_lock);
+	return status;
+}
+
+static void print_short_extent(struct pnfs_block_short_extent *be)
+{
+	dprintk("PRINT SHORT EXTENT extent %p\n", be);
+	if (be) {
+		dprintk("        be_f_offset %llu\n", (u64)be->bse_f_offset);
+		dprintk("        be_length   %llu\n", (u64)be->bse_length);
+	}
+}
+
+void print_clist(struct list_head *list, unsigned int count)
+{
+	struct pnfs_block_short_extent *be;
+	unsigned int i = 0;
+
+	ifdebug(FACILITY) {
+		printk(KERN_DEBUG "****************\n");
+		printk(KERN_DEBUG "Extent list looks like:\n");
+		list_for_each_entry(be, list, bse_node) {
+			i++;
+			print_short_extent(be);
+		}
+		if (i != count)
+			printk(KERN_DEBUG "\n\nExpected %u entries\n\n\n", count);
+		printk(KERN_DEBUG "****************\n");
+	}
+}
+
 static void print_bl_extent(struct pnfs_block_extent *be)
 {
 	dprintk("PRINT EXTENT extent %p\n", be);
@@ -386,65 +429,67 @@ bl_add_merge_extent(struct pnfs_block_layout *bl,
 	/* Scan for proper place to insert, extending new to the left
 	 * as much as possible.
 	 */
-	list_for_each_entry_safe(be, tmp, list, be_node) {
-		if (new->be_f_offset < be->be_f_offset)
+	list_for_each_entry_safe_reverse(be, tmp, list, be_node) {
+		if (new->be_f_offset >= be->be_f_offset + be->be_length)
 			break;
-		if (end <= be->be_f_offset + be->be_length) {
-			/* new is a subset of existing be*/
+		if (new->be_f_offset >= be->be_f_offset) {
+			if (end <= be->be_f_offset + be->be_length) {
+				/* new is a subset of existing be*/
+				if (extents_consistent(be, new)) {
+					dprintk("%s: new is subset, ignoring\n",
+						__func__);
+					bl_put_extent(new);
+					return 0;
+				} else {
+					goto out_err;
+				}
+			} else {
+				/* |<--   be   -->|
+				 *          |<--   new   -->| */
+				if (extents_consistent(be, new)) {
+					/* extend new to fully replace be */
+					new->be_length += new->be_f_offset -
+						be->be_f_offset;
+					new->be_f_offset = be->be_f_offset;
+					new->be_v_offset = be->be_v_offset;
+					dprintk("%s: removing %p\n", __func__, be);
+					list_del(&be->be_node);
+					bl_put_extent(be);
+				} else {
+					goto out_err;
+				}
+			}
+		} else if (end >= be->be_f_offset + be->be_length) {
+			/* new extent overlap existing be */
 			if (extents_consistent(be, new)) {
-				dprintk("%s: new is subset, ignoring\n",
-					__func__);
-				bl_put_extent(new);
-				return 0;
-			} else
+				/* extend new to fully replace be */
+				dprintk("%s: removing %p\n", __func__, be);
+				list_del(&be->be_node);
+				bl_put_extent(be);
+			} else {
 				goto out_err;
-		} else if (new->be_f_offset <=
-				be->be_f_offset + be->be_length) {
-			/* new overlaps or abuts existing be */
-			if (extents_consistent(be, new)) {
+			}
+		} else if (end > be->be_f_offset) {
+			/*           |<--   be   -->|
+			 *|<--   new   -->| */
+			if (extents_consistent(new, be)) {
 				/* extend new to fully replace be */
-				new->be_length += new->be_f_offset -
-						  be->be_f_offset;
-				new->be_f_offset = be->be_f_offset;
-				new->be_v_offset = be->be_v_offset;
+				new->be_length += be->be_f_offset + be->be_length -
+					new->be_f_offset - new->be_length;
 				dprintk("%s: removing %p\n", __func__, be);
 				list_del(&be->be_node);
 				bl_put_extent(be);
-			} else if (new->be_f_offset !=
-				   be->be_f_offset + be->be_length)
+			} else {
 				goto out_err;
+			}
 		}
 	}
 	/* Note that if we never hit the above break, be will not point to a
 	 * valid extent.  However, in that case &be->be_node==list.
 	 */
-	list_add_tail(&new->be_node, &be->be_node);
+	list_add(&new->be_node, &be->be_node);
 	dprintk("%s: inserting new\n", __func__);
 	print_elist(list);
-	/* Scan forward for overlaps.  If we find any, extend new and
-	 * remove the overlapped extent.
-	 */
-	be = list_prepare_entry(new, list, be_node);
-	list_for_each_entry_safe_continue(be, tmp, list, be_node) {
-		if (end < be->be_f_offset)
-			break;
-		/* new overlaps or abuts existing be */
-		if (extents_consistent(be, new)) {
-			if (end < be->be_f_offset + be->be_length) {
-				/* extend new to fully cover be */
-				end = be->be_f_offset + be->be_length;
-				new->be_length = end - new->be_f_offset;
-			}
-			dprintk("%s: removing %p\n", __func__, be);
-			list_del(&be->be_node);
-			bl_put_extent(be);
-		} else if (end != be->be_f_offset) {
-			list_del(&new->be_node);
-			goto out_err;
-		}
-	}
-	dprintk("%s: after merging\n", __func__);
-	print_elist(list);
 	/* FIXME - The per-list consistency checks have all been done,
 	 * should now check cross-list consistency.
 	 */
@@ -502,6 +547,49 @@ bl_find_get_extent(struct pnfs_block_layout *bl, sector_t isect,
 	return ret;
 }
 
+int
+encode_pnfs_block_layoutupdate(struct pnfs_block_layout *bl,
+			       struct xdr_stream *xdr,
+			       const struct nfs4_layoutcommit_args *arg)
+{
+	struct pnfs_block_short_extent *lce, *save;
+	unsigned int count = 0;
+	__be32 *p, *xdr_start;
+
+	dprintk("%s enter\n", __func__);
+	/* BUG - creation of bl_commit is buggy - need to wait for
+	 * entire block to be marked WRITTEN before it can be added.
+	 */
+	spin_lock(&bl->bl_ext_lock);
+	/* Want to adjust for possible truncate */
+	/* We now want to adjust argument range */
+
+	/* XDR encode the ranges found */
+	xdr_start = xdr_reserve_space(xdr, 8);
+	if (!xdr_start)
+		goto out;
+	list_for_each_entry_safe(lce, save, &bl->bl_commit, bse_node) {
+		p = xdr_reserve_space(xdr, 7 * 4 + sizeof(lce->bse_devid.data));
+		if (!p)
+			break;
+		p = xdr_encode_opaque_fixed(p, lce->bse_devid.data, NFS4_DEVICEID4_SIZE);
+		p = xdr_encode_hyper(p, lce->bse_f_offset << SECTOR_SHIFT);
+		p = xdr_encode_hyper(p, lce->bse_length << SECTOR_SHIFT);
+		p = xdr_encode_hyper(p, 0LL);
+		*p++ = cpu_to_be32(PNFS_BLOCK_READWRITE_DATA);
+		list_del(&lce->bse_node);
+		list_add_tail(&lce->bse_node, &bl->bl_committing);
+		bl->bl_count--;
+		count++;
+	}
+	xdr_start[0] = cpu_to_be32((xdr->p - xdr_start - 1) * 4);
+	xdr_start[1] = cpu_to_be32(count);
+out:
+	spin_unlock(&bl->bl_ext_lock);
+	dprintk("%s found %i ranges\n", __func__, count);
+	return 0;
+}
+
 /* Helper function to set_to_rw that initialize a new extent */
 static void
 _prep_new_extent(struct pnfs_block_extent *new,
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 19/25] pnfsblock: cleanup_layoutcommit
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (17 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 18/25] pnfsblock: encode_layoutcommit Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 18:40 ` [PATCH v3 20/25] pnfsblock: bl_read_pagelist Jim Rees
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Fred Isaman <iisaman@citi.umich.edu>

In blocklayout driver. There are two things happening
while layoutcommit/cleanup.
1. the modified extents are encoded.
2. On cleanup the extents are put back on the layout rw
   extents list, for reads.

In the new system where actual xdr encoding is done in
encode_layoutcommit() directly into xdr buffer, these are
the new commit stages:

1. On setup_layoutcommit, the range is adjusted as before
   and a structure is allocated for communication with
   bl_encode_layoutcommit && bl_cleanup_layoutcommit
   (Generic layer provides a void-star to hang it on)

2. bl_encode_layoutcommit is called to do the actual
   encoding directly into xdr. The commit-extent-list is not
   freed and is stored on above structure.
   FIXME: The code is not yet converted to the new XDR cleanup

3. On cleanup the commit-extent-list is put back by a call
   to set_to_rw() as before, but with no need for XDR decoding
   of the list as before. And the commit-extent-list is freed.
   Finally allocated structure is freed.

[SQUASHME: pnfs: blocklayout: port block layout code]
Signed-off-by: Peng Tao <peng_tao@emc.com>
[pnfsblock: SQUASHME: adjust to API change]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[blocklayout: encode_layoutcommit implementation]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[pnfsblock: fix bug setting up layoutcommit.]
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
[pnfsblock: cleanup_layoutcommit wants a status parameter]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
---
 fs/nfs/blocklayout/blocklayout.c |    2 +
 fs/nfs/blocklayout/blocklayout.h |    3 +
 fs/nfs/blocklayout/extents.c     |  210 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 215 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 0c05f8a..c63296d 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -163,6 +163,8 @@ static void
 bl_cleanup_layoutcommit(struct pnfs_layout_hdr *lo,
 			struct nfs4_layoutcommit_data *lcdata)
 {
+	dprintk("%s enter\n", __func__);
+	clean_pnfs_block_layoutupdate(BLK_LO2EXT(lo), &lcdata->args, lcdata->res.status);
 }
 
 static void free_blk_mountid(struct block_mount_id *mid)
diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
index de908da..79f564d 100644
--- a/fs/nfs/blocklayout/blocklayout.h
+++ b/fs/nfs/blocklayout/blocklayout.h
@@ -197,6 +197,9 @@ int is_sector_initialized(struct pnfs_inval_markings *marks, sector_t isect);
 int encode_pnfs_block_layoutupdate(struct pnfs_block_layout *bl,
 				   struct xdr_stream *xdr,
 				   const struct nfs4_layoutcommit_args *arg);
+void clean_pnfs_block_layoutupdate(struct pnfs_block_layout *bl,
+				   const struct nfs4_layoutcommit_args *arg,
+				   int status);
 int bl_add_merge_extent(struct pnfs_block_layout *bl,
 			 struct pnfs_block_extent *new);
 
diff --git a/fs/nfs/blocklayout/extents.c b/fs/nfs/blocklayout/extents.c
index b46c8be..a9224a1 100644
--- a/fs/nfs/blocklayout/extents.c
+++ b/fs/nfs/blocklayout/extents.c
@@ -329,6 +329,73 @@ void print_clist(struct list_head *list, unsigned int count)
 	}
 }
 
+/* Note: In theory, we should do more checking that devid's match between
+ * old and new, but if they don't, the lists are too corrupt to salvage anyway.
+ */
+/* Note this is very similar to bl_add_merge_extent */
+static void add_to_commitlist(struct pnfs_block_layout *bl,
+			      struct pnfs_block_short_extent *new)
+{
+	struct list_head *clist = &bl->bl_commit;
+	struct pnfs_block_short_extent *old, *save;
+	sector_t end = new->bse_f_offset + new->bse_length;
+
+	dprintk("%s enter\n", __func__);
+	print_short_extent(new);
+	print_clist(clist, bl->bl_count);
+	bl->bl_count++;
+	/* Scan for proper place to insert, extending new to the left
+	 * as much as possible.
+	 */
+	list_for_each_entry_safe(old, save, clist, bse_node) {
+		if (new->bse_f_offset < old->bse_f_offset)
+			break;
+		if (end <= old->bse_f_offset + old->bse_length) {
+			/* Range is already in list */
+			bl->bl_count--;
+			kfree(new);
+			return;
+		} else if (new->bse_f_offset <=
+				old->bse_f_offset + old->bse_length) {
+			/* new overlaps or abuts existing be */
+			if (new->bse_mdev == old->bse_mdev) {
+				/* extend new to fully replace old */
+				new->bse_length += new->bse_f_offset -
+						old->bse_f_offset;
+				new->bse_f_offset = old->bse_f_offset;
+				list_del(&old->bse_node);
+				bl->bl_count--;
+				kfree(old);
+			}
+		}
+	}
+	/* Note that if we never hit the above break, old will not point to a
+	 * valid extent.  However, in that case &old->bse_node==list.
+	 */
+	list_add_tail(&new->bse_node, &old->bse_node);
+	/* Scan forward for overlaps.  If we find any, extend new and
+	 * remove the overlapped extent.
+	 */
+	old = list_prepare_entry(new, clist, bse_node);
+	list_for_each_entry_safe_continue(old, save, clist, bse_node) {
+		if (end < old->bse_f_offset)
+			break;
+		/* new overlaps or abuts old */
+		if (new->bse_mdev == old->bse_mdev) {
+			if (end < old->bse_f_offset + old->bse_length) {
+				/* extend new to fully cover old */
+				end = old->bse_f_offset + old->bse_length;
+				new->bse_length = end - new->bse_f_offset;
+			}
+			list_del(&old->bse_node);
+			bl->bl_count--;
+			kfree(old);
+		}
+	}
+	dprintk("%s: after merging\n", __func__);
+	print_clist(clist, bl->bl_count);
+}
+
 static void print_bl_extent(struct pnfs_block_extent *be)
 {
 	dprintk("PRINT EXTENT extent %p\n", be);
@@ -547,6 +614,34 @@ bl_find_get_extent(struct pnfs_block_layout *bl, sector_t isect,
 	return ret;
 }
 
+/* Similar to bl_find_get_extent, but called with lock held, and ignores cow */
+static struct pnfs_block_extent *
+bl_find_get_extent_locked(struct pnfs_block_layout *bl, sector_t isect)
+{
+	struct pnfs_block_extent *be, *ret = NULL;
+	int i;
+
+	dprintk("%s enter with isect %llu\n", __func__, (u64)isect);
+	for (i = 0; i < EXTENT_LISTS; i++) {
+		if (ret)
+			break;
+		list_for_each_entry_reverse(be, &bl->bl_extents[i], be_node) {
+			if (isect >= be->be_f_offset + be->be_length)
+				break;
+			if (isect >= be->be_f_offset) {
+				/* We have found an extent */
+				dprintk("%s Get %p (%i)\n", __func__, be,
+					atomic_read(&be->be_refcnt.refcount));
+				kref_get(&be->be_refcnt);
+				ret = be;
+				break;
+			}
+		}
+	}
+	print_bl_extent(ret);
+	return ret;
+}
+
 int
 encode_pnfs_block_layoutupdate(struct pnfs_block_layout *bl,
 			       struct xdr_stream *xdr,
@@ -636,3 +731,118 @@ _front_merge(struct pnfs_block_extent *be, struct list_head *head,
 	kfree(storage);
 	return be;
 }
+
+static u64
+set_to_rw(struct pnfs_block_layout *bl, u64 offset, u64 length)
+{
+	u64 rv = offset + length;
+	struct pnfs_block_extent *be, *e1, *e2, *e3, *new, *old;
+	struct pnfs_block_extent *children[3];
+	struct pnfs_block_extent *merge1 = NULL, *merge2 = NULL;
+	int i = 0, j;
+
+	dprintk("%s(%llu, %llu)\n", __func__, offset, length);
+	/* Create storage for up to three new extents e1, e2, e3 */
+	e1 = kmalloc(sizeof(*e1), GFP_ATOMIC);
+	e2 = kmalloc(sizeof(*e2), GFP_ATOMIC);
+	e3 = kmalloc(sizeof(*e3), GFP_ATOMIC);
+	/* BUG - we are ignoring any failure */
+	if (!e1 || !e2 || !e3)
+		goto out_nosplit;
+
+	spin_lock(&bl->bl_ext_lock);
+	be = bl_find_get_extent_locked(bl, offset);
+	rv = be->be_f_offset + be->be_length;
+	if (be->be_state != PNFS_BLOCK_INVALID_DATA) {
+		spin_unlock(&bl->bl_ext_lock);
+		goto out_nosplit;
+	}
+	/* Add e* to children, bumping e*'s krefs */
+	if (be->be_f_offset != offset) {
+		_prep_new_extent(e1, be, be->be_f_offset,
+				 offset - be->be_f_offset,
+				 PNFS_BLOCK_INVALID_DATA);
+		children[i++] = e1;
+		print_bl_extent(e1);
+	} else
+		merge1 = e1;
+	_prep_new_extent(e2, be, offset,
+			 min(length, be->be_f_offset + be->be_length - offset),
+			 PNFS_BLOCK_READWRITE_DATA);
+	children[i++] = e2;
+	print_bl_extent(e2);
+	if (offset + length < be->be_f_offset + be->be_length) {
+		_prep_new_extent(e3, be, e2->be_f_offset + e2->be_length,
+				 be->be_f_offset + be->be_length -
+				 offset - length,
+				 PNFS_BLOCK_INVALID_DATA);
+		children[i++] = e3;
+		print_bl_extent(e3);
+	} else
+		merge2 = e3;
+
+	/* Remove be from list, and insert the e* */
+	/* We don't get refs on e*, since this list is the base reference
+	 * set when init'ed.
+	 */
+	if (i < 3)
+		children[i] = NULL;
+	new = children[0];
+	list_replace(&be->be_node, &new->be_node);
+	bl_put_extent(be);
+	new = _front_merge(new, &bl->bl_extents[RW_EXTENT], merge1);
+	for (j = 1; j < i; j++) {
+		old = new;
+		new = children[j];
+		list_add(&new->be_node, &old->be_node);
+	}
+	if (merge2) {
+		/* This is a HACK, should just create a _back_merge function */
+		new = list_entry(new->be_node.next,
+				 struct pnfs_block_extent, be_node);
+		new = _front_merge(new, &bl->bl_extents[RW_EXTENT], merge2);
+	}
+	spin_unlock(&bl->bl_ext_lock);
+
+	/* Since we removed the base reference above, be is now scheduled for
+	 * destruction.
+	 */
+	bl_put_extent(be);
+	dprintk("%s returns %llu after split\n", __func__, rv);
+	return rv;
+
+ out_nosplit:
+	kfree(e1);
+	kfree(e2);
+	kfree(e3);
+	dprintk("%s returns %llu without splitting\n", __func__, rv);
+	return rv;
+}
+
+void
+clean_pnfs_block_layoutupdate(struct pnfs_block_layout *bl,
+			      const struct nfs4_layoutcommit_args *arg,
+			      int status)
+{
+	struct pnfs_block_short_extent *lce, *save;
+
+	dprintk("%s status %d\n", __func__, status);
+	list_for_each_entry_safe(lce, save, &bl->bl_committing, bse_node) {
+		if (likely(!status)) {
+			u64 offset = lce->bse_f_offset;
+			u64 end = offset + lce->bse_length;
+
+			do {
+				offset = set_to_rw(bl, offset, end - offset);
+			} while (offset < end);
+			list_del(&lce->bse_node);
+
+			kfree(lce);
+		} else {
+			list_del(&lce->bse_node);
+			spin_lock(&bl->bl_ext_lock);
+			add_to_commitlist(bl, lce);
+			spin_unlock(&bl->bl_ext_lock);
+		}
+	}
+}
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 20/25] pnfsblock: bl_read_pagelist
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (18 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 19/25] pnfsblock: cleanup_layoutcommit Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 20:09   ` Trond Myklebust
  2011-07-27 18:40 ` [PATCH v3 21/25] pnfsblock: bl_write_pagelist Jim Rees
                   ` (4 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Fred Isaman <iisaman@citi.umich.edu>

Note: When upper layer's read/write request cannot be fulfilled, the block
layout driver shouldn't silently mark the page as error. It should do
what can be done and  leave the rest to the upper layer. To do so, we
should set rdata/wdata->res.count properly.

When upper layer re-send the read/write request to finish the rest
part of the request, pgbase is the position where we should start at.

[pnfsblock: mark IO error with NFS_LAYOUT_{RW|RO}_FAILED]
Signed-off-by: Peng Tao <peng_tao@emc.com>
[pnfsblock: read path error handling]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: handle errors when read or write pagelist.]
Signed-off-by: Zhang Jingwang <yyalone@gmail.com>
[pnfs-block: use new read_pagelist api]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
---
 fs/nfs/blocklayout/blocklayout.c |  282 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 282 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index c63296d..6e90371 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -29,10 +29,12 @@
  * of the software, even if it has been or is hereafter advised of the
  * possibility of such damages.
  */
+
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/mount.h>
 #include <linux/namei.h>
+#include <linux/bio.h>		/* struct bio */
 
 #include "blocklayout.h"
 
@@ -45,9 +47,289 @@ MODULE_DESCRIPTION("The NFSv4.1 pNFS Block layout driver");
 struct dentry *bl_device_pipe;
 wait_queue_head_t bl_wq;
 
+static void print_page(struct page *page)
+{
+	dprintk("PRINTPAGE page %p\n", page);
+	dprintk("	PagePrivate %d\n", PagePrivate(page));
+	dprintk("	PageUptodate %d\n", PageUptodate(page));
+	dprintk("	PageError %d\n", PageError(page));
+	dprintk("	PageDirty %d\n", PageDirty(page));
+	dprintk("	PageReferenced %d\n", PageReferenced(page));
+	dprintk("	PageLocked %d\n", PageLocked(page));
+	dprintk("	PageWriteback %d\n", PageWriteback(page));
+	dprintk("	PageMappedToDisk %d\n", PageMappedToDisk(page));
+	dprintk("\n");
+}
+
+/* Given the be associated with isect, determine if page data needs to be
+ * initialized.
+ */
+static int is_hole(struct pnfs_block_extent *be, sector_t isect)
+{
+	if (be->be_state == PNFS_BLOCK_NONE_DATA)
+		return 1;
+	else if (be->be_state != PNFS_BLOCK_INVALID_DATA)
+		return 0;
+	else
+		return !is_sector_initialized(be->be_inval, isect);
+}
+
+static int
+dont_like_caller(struct nfs_page *req)
+{
+	if (atomic_read(&req->wb_complete)) {
+		/* Called by _multi */
+		return 1;
+	} else {
+		/* Called by _one */
+		return 0;
+	}
+}
+
+/* The data we are handed might be spread across several bios.  We need
+ * to track when the last one is finished.
+ */
+struct parallel_io {
+	struct kref refcnt;
+	struct rpc_call_ops call_ops;
+	void (*pnfs_callback) (void *data);
+	void *data;
+};
+
+static inline struct parallel_io *alloc_parallel(void *data)
+{
+	struct parallel_io *rv;
+
+	rv  = kmalloc(sizeof(*rv), GFP_NOFS);
+	if (rv) {
+		rv->data = data;
+		kref_init(&rv->refcnt);
+	}
+	return rv;
+}
+
+static inline void get_parallel(struct parallel_io *p)
+{
+	kref_get(&p->refcnt);
+}
+
+static void destroy_parallel(struct kref *kref)
+{
+	struct parallel_io *p = container_of(kref, struct parallel_io, refcnt);
+
+	dprintk("%s enter\n", __func__);
+	p->pnfs_callback(p->data);
+	kfree(p);
+}
+
+static inline void put_parallel(struct parallel_io *p)
+{
+	kref_put(&p->refcnt, destroy_parallel);
+}
+
+static struct bio *
+bl_submit_bio(int rw, struct bio *bio)
+{
+	if (bio) {
+		get_parallel(bio->bi_private);
+		dprintk("%s submitting %s bio %u@%llu\n", __func__,
+			rw == READ ? "read" : "write",
+			bio->bi_size, (unsigned long long)bio->bi_sector);
+		submit_bio(rw, bio);
+	}
+	return NULL;
+}
+
+static struct bio *bl_alloc_init_bio(int npg, sector_t isect,
+				     struct pnfs_block_extent *be,
+				     void (*end_io)(struct bio *, int err),
+				     struct parallel_io *par)
+{
+	struct bio *bio;
+
+	bio = bio_alloc(GFP_NOIO, npg);
+	if (!bio)
+		return NULL;
+
+	bio->bi_sector = isect - be->be_f_offset + be->be_v_offset;
+	bio->bi_bdev = be->be_mdev;
+	bio->bi_end_io = end_io;
+	bio->bi_private = par;
+	return bio;
+}
+
+static struct bio *bl_add_page_to_bio(struct bio *bio, int npg, int rw,
+				      sector_t isect, struct page *page,
+				      struct pnfs_block_extent *be,
+				      void (*end_io)(struct bio *, int err),
+				      struct parallel_io *par)
+{
+retry:
+	if (!bio) {
+		bio = bl_alloc_init_bio(npg, isect, be, end_io, par);
+		if (!bio)
+			return ERR_PTR(-ENOMEM);
+	}
+	if (bio_add_page(bio, page, PAGE_CACHE_SIZE, 0) < PAGE_CACHE_SIZE) {
+		bio = bl_submit_bio(rw, bio);
+		goto retry;
+	}
+	return bio;
+}
+
+static void bl_set_lo_fail(struct pnfs_layout_segment *lseg)
+{
+	if (lseg->pls_range.iomode == IOMODE_RW) {
+		dprintk("%s Setting layout IOMODE_RW fail bit\n", __func__);
+		set_bit(lo_fail_bit(IOMODE_RW), &lseg->pls_layout->plh_flags);
+	} else {
+		dprintk("%s Setting layout IOMODE_READ fail bit\n", __func__);
+		set_bit(lo_fail_bit(IOMODE_READ), &lseg->pls_layout->plh_flags);
+	}
+}
+
+/* This is basically copied from mpage_end_io_read */
+static void bl_end_io_read(struct bio *bio, int err)
+{
+	struct parallel_io *par = bio->bi_private;
+	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
+	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
+	struct nfs_read_data *rdata = (struct nfs_read_data *)par->data;
+
+	do {
+		struct page *page = bvec->bv_page;
+
+		if (--bvec >= bio->bi_io_vec)
+			prefetchw(&bvec->bv_page->flags);
+		if (uptodate)
+			SetPageUptodate(page);
+	} while (bvec >= bio->bi_io_vec);
+	if (!uptodate) {
+		if (!rdata->pnfs_error)
+			rdata->pnfs_error = -EIO;
+		bl_set_lo_fail(rdata->lseg);
+	}
+	bio_put(bio);
+	put_parallel(par);
+}
+
+static void bl_read_cleanup(struct work_struct *work)
+{
+	struct rpc_task *task;
+	struct nfs_read_data *rdata;
+	dprintk("%s enter\n", __func__);
+	task = container_of(work, struct rpc_task, u.tk_work);
+	rdata = container_of(task, struct nfs_read_data, task);
+	pnfs_ld_read_done(rdata);
+}
+
+static void
+bl_end_par_io_read(void *data)
+{
+	struct nfs_read_data *rdata = data;
+
+	INIT_WORK(&rdata->task.u.tk_work, bl_read_cleanup);
+	schedule_work(&rdata->task.u.tk_work);
+}
+
+/* We don't want normal .rpc_call_done callback used, so we replace it
+ * with this stub.
+ */
+static void bl_rpc_do_nothing(struct rpc_task *task, void *calldata)
+{
+	return;
+}
+
 static enum pnfs_try_status
 bl_read_pagelist(struct nfs_read_data *rdata)
 {
+	int i, hole;
+	struct bio *bio = NULL;
+	struct pnfs_block_extent *be = NULL, *cow_read = NULL;
+	sector_t isect, extent_length = 0;
+	struct parallel_io *par;
+	loff_t f_offset = rdata->args.offset;
+	size_t count = rdata->args.count;
+	struct page **pages = rdata->args.pages;
+	int pg_index = rdata->args.pgbase >> PAGE_CACHE_SHIFT;
+
+	dprintk("%s enter nr_pages %u offset %lld count %Zd\n", __func__,
+	       rdata->npages, f_offset, count);
+
+	if (dont_like_caller(rdata->req)) {
+		dprintk("%s dont_like_caller failed\n", __func__);
+		goto use_mds;
+	}
+
+	par = alloc_parallel(rdata);
+	if (!par)
+		goto use_mds;
+	par->call_ops = *rdata->mds_ops;
+	par->call_ops.rpc_call_done = bl_rpc_do_nothing;
+	par->pnfs_callback = bl_end_par_io_read;
+	/* At this point, we can no longer jump to use_mds */
+
+	isect = (sector_t) (f_offset >> SECTOR_SHIFT);
+	/* Code assumes extents are page-aligned */
+	for (i = pg_index; i < rdata->npages; i++) {
+		if (!extent_length) {
+			/* We've used up the previous extent */
+			bl_put_extent(be);
+			bl_put_extent(cow_read);
+			bio = bl_submit_bio(READ, bio);
+			/* Get the next one */
+			be = bl_find_get_extent(BLK_LSEG2EXT(rdata->lseg),
+					     isect, &cow_read);
+			if (!be) {
+				rdata->pnfs_error = -EIO;
+				goto out;
+			}
+			extent_length = be->be_length -
+				(isect - be->be_f_offset);
+			if (cow_read) {
+				sector_t cow_length = cow_read->be_length -
+					(isect - cow_read->be_f_offset);
+				extent_length = min(extent_length, cow_length);
+			}
+		}
+		hole = is_hole(be, isect);
+		if (hole && !cow_read) {
+			bio = bl_submit_bio(READ, bio);
+			/* Fill hole w/ zeroes w/o accessing device */
+			dprintk("%s Zeroing page for hole\n", __func__);
+			zero_user_segment(pages[i], 0, PAGE_CACHE_SIZE);
+			print_page(pages[i]);
+			SetPageUptodate(pages[i]);
+		} else {
+			struct pnfs_block_extent *be_read;
+
+			be_read = (hole && cow_read) ? cow_read : be;
+			bio = bl_add_page_to_bio(bio, rdata->npages - i, READ,
+						 isect, pages[i], be_read,
+						 bl_end_io_read, par);
+			if (IS_ERR(bio)) {
+				rdata->pnfs_error = PTR_ERR(bio);
+				goto out;
+			}
+		}
+		isect += PAGE_CACHE_SECTORS;
+		extent_length -= PAGE_CACHE_SECTORS;
+	}
+	if ((isect << SECTOR_SHIFT) >= rdata->inode->i_size) {
+		rdata->res.eof = 1;
+		rdata->res.count = rdata->inode->i_size - f_offset;
+	} else {
+		rdata->res.count = (isect << SECTOR_SHIFT) - f_offset;
+	}
+out:
+	bl_put_extent(be);
+	bl_put_extent(cow_read);
+	bl_submit_bio(READ, bio);
+	put_parallel(par);
+	return PNFS_ATTEMPTED;
+
+ use_mds:
+	dprintk("Giving up and using normal NFS\n");
 	return PNFS_NOT_ATTEMPTED;
 }
 
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 21/25] pnfsblock: bl_write_pagelist
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (19 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 20/25] pnfsblock: bl_read_pagelist Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 20:11   ` Trond Myklebust
  2011-07-27 18:40 ` [PATCH v3 22/25] pnfsblock: note written INVAL areas for layoutcommit Jim Rees
                   ` (3 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Fred Isaman <iisaman@citi.umich.edu>

Note: When upper layer's read/write request cannot be fulfilled, the block
layout driver shouldn't silently mark the page as error. It should do
what can be done and  leave the rest to the upper layer. To do so, we
should set rdata/wdata->res.count properly.

When upper layer re-send the read/write request to finish the rest
part of the request, pgbase is the position where we should start at.

[pnfsblock: bl_write_pagelist support functions]
[pnfsblock: bl_write_pagelist adjust for missing PG_USE_PNFS]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: handle errors when read or write pagelist.]
Signed-off-by: Zhang Jingwang <yyalone@gmail.com>
[pnfs-block: use new write_pagelist api]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>

[SQUASHME: pnfsblock: mds_offset is set in the generic layer]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>

[pnfsblock: mark IO error with NFS_LAYOUT_{RW|RO}_FAILED]
Signed-off-by: Peng Tao <peng_tao@emc.com>
[pnfsblock: SQUASHME: adjust to API change]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: fixup blksize alignment in bl_setup_layoutcommit]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[pnfsblock: bl_write_pagelist adjust for missing PG_USE_PNFS]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: handle errors when read or write pagelist.]
Signed-off-by: Zhang Jingwang <yyalone@gmail.com>
[pnfs-block: use new write_pagelist api]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
---
 fs/nfs/blocklayout/blocklayout.c |  137 +++++++++++++++++++++++++++++++++++++-
 1 files changed, 134 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 6e90371..963b3ea 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -74,6 +74,19 @@ static int is_hole(struct pnfs_block_extent *be, sector_t isect)
 		return !is_sector_initialized(be->be_inval, isect);
 }
 
+/* Given the be associated with isect, determine if page data can be
+ * written to disk.
+ */
+static int is_writable(struct pnfs_block_extent *be, sector_t isect)
+{
+	if (be->be_state == PNFS_BLOCK_READWRITE_DATA)
+		return 1;
+	else if (be->be_state != PNFS_BLOCK_INVALID_DATA)
+		return 0;
+	else
+		return is_sector_initialized(be->be_inval, isect);
+}
+
 static int
 dont_like_caller(struct nfs_page *req)
 {
@@ -333,11 +346,129 @@ out:
 	return PNFS_NOT_ATTEMPTED;
 }
 
+/* This is basically copied from mpage_end_io_read */
+static void bl_end_io_write(struct bio *bio, int err)
+{
+	struct parallel_io *par = bio->bi_private;
+	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
+	struct nfs_write_data *wdata = (struct nfs_write_data *)par->data;
+
+	if (!uptodate) {
+		if (!wdata->pnfs_error)
+			wdata->pnfs_error = -EIO;
+		bl_set_lo_fail(wdata->lseg);
+	}
+	bio_put(bio);
+	put_parallel(par);
+}
+
+/* Function scheduled for call during bl_end_par_io_write,
+ * it marks sectors as written and extends the commitlist.
+ */
+static void bl_write_cleanup(struct work_struct *work)
+{
+	struct rpc_task *task;
+	struct nfs_write_data *wdata;
+	dprintk("%s enter\n", __func__);
+	task = container_of(work, struct rpc_task, u.tk_work);
+	wdata = container_of(task, struct nfs_write_data, task);
+	pnfs_ld_write_done(wdata);
+}
+
+/* Called when last of bios associated with a bl_write_pagelist call finishes */
+static void
+bl_end_par_io_write(void *data)
+{
+	struct nfs_write_data *wdata = data;
+
+	/* STUB - ignoring error handling */
+	wdata->task.tk_status = 0;
+	wdata->verf.committed = NFS_FILE_SYNC;
+	INIT_WORK(&wdata->task.u.tk_work, bl_write_cleanup);
+	schedule_work(&wdata->task.u.tk_work);
+}
+
 static enum pnfs_try_status
-bl_write_pagelist(struct nfs_write_data *wdata,
-		  int sync)
+bl_write_pagelist(struct nfs_write_data *wdata, int sync)
 {
-	return PNFS_NOT_ATTEMPTED;
+	int i;
+	struct bio *bio = NULL;
+	struct pnfs_block_extent *be = NULL;
+	sector_t isect, extent_length = 0;
+	struct parallel_io *par;
+	loff_t offset = wdata->args.offset;
+	size_t count = wdata->args.count;
+	struct page **pages = wdata->args.pages;
+	int pg_index = wdata->args.pgbase >> PAGE_CACHE_SHIFT;
+
+	dprintk("%s enter, %Zu@%lld\n", __func__, count, offset);
+	if (!wdata->lseg) {
+		dprintk("%s no lseg, falling back to MDS\n", __func__);
+		return PNFS_NOT_ATTEMPTED;
+	}
+	if (dont_like_caller(wdata->req)) {
+		dprintk("%s dont_like_caller failed\n", __func__);
+		return PNFS_NOT_ATTEMPTED;
+	}
+	/* At this point, wdata->pages is a (sequential) list of nfs_pages.
+	 * We want to write each, and if there is an error remove it from
+	 * list and call
+	 * nfs_retry_request(req) to have it redone using nfs.
+	 * QUEST? Do as block or per req?  Think have to do per block
+	 * as part of end_bio
+	 */
+	par = alloc_parallel(wdata);
+	if (!par)
+		return PNFS_NOT_ATTEMPTED;
+	par->call_ops = *wdata->mds_ops;
+	par->call_ops.rpc_call_done = bl_rpc_do_nothing;
+	par->pnfs_callback = bl_end_par_io_write;
+	/* At this point, have to be more careful with error handling */
+
+	isect = (sector_t) ((offset & (long)PAGE_CACHE_MASK) >> SECTOR_SHIFT);
+	for (i = pg_index; i < wdata->npages ; i++) {
+		if (!extent_length) {
+			/* We've used up the previous extent */
+			bl_put_extent(be);
+			bio = bl_submit_bio(WRITE, bio);
+			/* Get the next one */
+			be = bl_find_get_extent(BLK_LSEG2EXT(wdata->lseg),
+					     isect, NULL);
+			if (!be || !is_writable(be, isect)) {
+				wdata->pnfs_error = -ENOMEM;
+				goto out;
+			}
+			extent_length = be->be_length -
+				(isect - be->be_f_offset);
+		}
+		for (;;) {
+			if (!bio) {
+				bio = bio_alloc(GFP_NOIO, wdata->npages - i);
+				if (!bio) {
+					wdata->pnfs_error = -ENOMEM;
+					goto out;
+				}
+				bio->bi_sector = isect - be->be_f_offset +
+					be->be_v_offset;
+				bio->bi_bdev = be->be_mdev;
+				bio->bi_end_io = bl_end_io_write;
+				bio->bi_private = par;
+			}
+			if (bio_add_page(bio, pages[i], PAGE_SIZE, 0))
+				break;
+			bio = bl_submit_bio(WRITE, bio);
+		}
+		isect += PAGE_CACHE_SECTORS;
+		extent_length -= PAGE_CACHE_SECTORS;
+	}
+	wdata->res.count = (isect << SECTOR_SHIFT) - (offset);
+	if (count < wdata->res.count)
+		wdata->res.count = count;
+out:
+	bl_put_extent(be);
+	bl_submit_bio(WRITE, bio);
+	put_parallel(par);
+	return PNFS_ATTEMPTED;
 }
 
 /* FIXME - range ignored */
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 22/25] pnfsblock: note written INVAL areas for layoutcommit
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (20 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 21/25] pnfsblock: bl_write_pagelist Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 20:13   ` Trond Myklebust
  2011-07-27 18:40 ` [PATCH v3 23/25] pnfsblock: use pageio_ops api Jim Rees
                   ` (2 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Fred Isaman <iisaman@citi.umich.edu>

[SQUASHME: pnfs: blocklayout: port block layout code]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
---
 fs/nfs/blocklayout/blocklayout.c |   32 +++++++++++++
 fs/nfs/blocklayout/blocklayout.h |    2 +
 fs/nfs/blocklayout/extents.c     |   95 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 129 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 963b3ea..22de05c 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -346,6 +346,30 @@ out:
 	return PNFS_NOT_ATTEMPTED;
 }
 
+static void mark_extents_written(struct pnfs_block_layout *bl,
+				 __u64 offset, __u32 count)
+{
+	sector_t isect, end;
+	struct pnfs_block_extent *be;
+
+	dprintk("%s(%llu, %u)\n", __func__, offset, count);
+	if (count == 0)
+		return;
+	isect = (offset & (long)(PAGE_CACHE_MASK)) >> SECTOR_SHIFT;
+	end = (offset + count + PAGE_CACHE_SIZE - 1) & (long)(PAGE_CACHE_MASK);
+	end >>= SECTOR_SHIFT;
+	while (isect < end) {
+		sector_t len;
+		be = bl_find_get_extent(bl, isect, NULL);
+		BUG_ON(!be); /* FIXME */
+		len = min(end, be->be_f_offset + be->be_length) - isect;
+		if (be->be_state == PNFS_BLOCK_INVALID_DATA)
+			mark_for_commit(be, isect, len); /* What if fails? */
+		isect += len;
+		bl_put_extent(be);
+	}
+}
+
 /* This is basically copied from mpage_end_io_read */
 static void bl_end_io_write(struct bio *bio, int err)
 {
@@ -372,6 +396,14 @@ static void bl_write_cleanup(struct work_struct *work)
 	dprintk("%s enter\n", __func__);
 	task = container_of(work, struct rpc_task, u.tk_work);
 	wdata = container_of(task, struct nfs_write_data, task);
+	if (!wdata->task.tk_status) {
+		/* Marks for LAYOUTCOMMIT */
+		/* BUG - this should be called after each bio, not after
+		 * all finish, unless have some way of storing success/failure
+		 */
+		mark_extents_written(BLK_LSEG2EXT(wdata->lseg),
+				     wdata->args.offset, wdata->args.count);
+	}
 	pnfs_ld_write_done(wdata);
 }
 
diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
index 79f564d..b462e53 100644
--- a/fs/nfs/blocklayout/blocklayout.h
+++ b/fs/nfs/blocklayout/blocklayout.h
@@ -202,5 +202,7 @@ void clean_pnfs_block_layoutupdate(struct pnfs_block_layout *bl,
 				   int status);
 int bl_add_merge_extent(struct pnfs_block_layout *bl,
 			 struct pnfs_block_extent *new);
+int mark_for_commit(struct pnfs_block_extent *be,
+		    sector_t offset, sector_t length);
 
 #endif /* FS_NFS_NFS4BLOCKLAYOUT_H */
diff --git a/fs/nfs/blocklayout/extents.c b/fs/nfs/blocklayout/extents.c
index a9224a1..942911f 100644
--- a/fs/nfs/blocklayout/extents.c
+++ b/fs/nfs/blocklayout/extents.c
@@ -217,6 +217,48 @@ int is_sector_initialized(struct pnfs_inval_markings *marks, sector_t isect)
 	return rv;
 }
 
+/* Assume start, end already sector aligned */
+static int
+_range_has_tag(struct my_tree *tree, u64 start, u64 end, int32_t tag)
+{
+	struct pnfs_inval_tracking *pos;
+	u64 expect = 0;
+
+	dprintk("%s(%llu, %llu, %i) enter\n", __func__, start, end, tag);
+	list_for_each_entry_reverse(pos, &tree->mtt_stub, it_link) {
+		if (pos->it_sector >= end)
+			continue;
+		if (!expect) {
+			if ((pos->it_sector == end - tree->mtt_step_size) &&
+			    (pos->it_tags & (1 << tag))) {
+				expect = pos->it_sector - tree->mtt_step_size;
+				if (pos->it_sector < tree->mtt_step_size || expect < start)
+					return 1;
+				continue;
+			} else {
+				return 0;
+			}
+		}
+		if (pos->it_sector != expect || !(pos->it_tags & (1 << tag)))
+			return 0;
+		expect -= tree->mtt_step_size;
+		if (expect < start)
+			return 1;
+	}
+	return 0;
+}
+
+static int is_range_written(struct pnfs_inval_markings *marks,
+			    sector_t start, sector_t end)
+{
+	int rv;
+
+	spin_lock(&marks->im_lock);
+	rv = _range_has_tag(&marks->im_tree, start, end, EXTENT_WRITTEN);
+	spin_unlock(&marks->im_lock);
+	return rv;
+}
+
 /* Marks sectors in [offest, offset_length) as having been initialized.
  * All lengths are step-aligned, where step is min(pagesize, blocksize).
  * Notes where partial block is initialized, and helps prepare it for
@@ -396,6 +438,59 @@ static void add_to_commitlist(struct pnfs_block_layout *bl,
 	print_clist(clist, bl->bl_count);
 }
 
+/* Note the range described by offset, length is guaranteed to be contained
+ * within be.
+ */
+int mark_for_commit(struct pnfs_block_extent *be,
+		    sector_t offset, sector_t length)
+{
+	sector_t new_end, end = offset + length;
+	struct pnfs_block_short_extent *new;
+	struct pnfs_block_layout *bl = container_of(be->be_inval,
+						    struct pnfs_block_layout,
+						    bl_inval);
+
+	new = kmalloc(sizeof(*new), GFP_NOFS);
+	if (!new)
+		return -ENOMEM;
+
+	mark_written_sectors(be->be_inval, offset, length);
+	/* We want to add the range to commit list, but it must be
+	 * block-normalized, and verified that the normalized range has
+	 * been entirely written to disk.
+	 */
+	new->bse_f_offset = offset;
+	offset = normalize(offset, bl->bl_blocksize);
+	if (offset < new->bse_f_offset) {
+		if (is_range_written(be->be_inval, offset, new->bse_f_offset))
+			new->bse_f_offset = offset;
+		else
+			new->bse_f_offset = offset + bl->bl_blocksize;
+	}
+	new_end = normalize_up(end, bl->bl_blocksize);
+	if (end < new_end) {
+		if (is_range_written(be->be_inval, end, new_end))
+			end = new_end;
+		else
+			end = new_end - bl->bl_blocksize;
+	}
+	if (end <= new->bse_f_offset) {
+		kfree(new);
+		return 0;
+	}
+	new->bse_length = end - new->bse_f_offset;
+	new->bse_devid = be->be_devid;
+	new->bse_mdev = be->be_mdev;
+
+	spin_lock(&bl->bl_ext_lock);
+	/* new will be freed, either by add_to_commitlist if it decides not
+	 * to use it, or after LAYOUTCOMMIT uses it in the commitlist.
+	 */
+	add_to_commitlist(bl, new);
+	spin_unlock(&bl->bl_ext_lock);
+	return 0;
+}
+
 static void print_bl_extent(struct pnfs_block_extent *be)
 {
 	dprintk("PRINT EXTENT extent %p\n", be);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 23/25] pnfsblock: use pageio_ops api
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (21 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 22/25] pnfsblock: note written INVAL areas for layoutcommit Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 20:15   ` Trond Myklebust
  2011-07-27 18:40 ` [PATCH v3 24/25] pnfsblock: write_pagelist handle zero invalid extents Jim Rees
  2011-07-27 18:40 ` [PATCH v3 25/25] NFS41: Drop lseg ref before fallthru to MDS Jim Rees
  24 siblings, 1 reply; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Benny Halevy <bhalevy@panasas.com>

[pnfsblock: use pnfs_generic_pg_init_read/write]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/blocklayout/blocklayout.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 22de05c..502d384 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -762,6 +762,18 @@ bl_clear_layoutdriver(struct nfs_server *server)
 	return 0;
 }
 
+static const struct nfs_pageio_ops bl_pg_read_ops = {
+	.pg_init = pnfs_generic_pg_init_read,
+	.pg_test = pnfs_generic_pg_test,
+	.pg_doio = pnfs_generic_pg_readpages,
+};
+
+static const struct nfs_pageio_ops bl_pg_write_ops = {
+	.pg_init = pnfs_generic_pg_init_write,
+	.pg_test = pnfs_generic_pg_test,
+	.pg_doio = pnfs_generic_pg_writepages,
+};
+
 static struct pnfs_layoutdriver_type blocklayout_type = {
 	.id				= LAYOUT_BLOCK_VOLUME,
 	.name				= "LAYOUT_BLOCK_VOLUME",
@@ -775,6 +787,8 @@ static struct pnfs_layoutdriver_type blocklayout_type = {
 	.cleanup_layoutcommit		= bl_cleanup_layoutcommit,
 	.set_layoutdriver		= bl_set_layoutdriver,
 	.clear_layoutdriver		= bl_clear_layoutdriver,
+	.pg_read_ops			= &bl_pg_read_ops,
+	.pg_write_ops			= &bl_pg_write_ops,
 };
 
 static const struct rpc_pipe_ops bl_upcall_ops = {
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 24/25] pnfsblock: write_pagelist handle zero invalid extents
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (22 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 23/25] pnfsblock: use pageio_ops api Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 18:40 ` [PATCH v3 25/25] NFS41: Drop lseg ref before fallthru to MDS Jim Rees
  24 siblings, 0 replies; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Peng Tao <bergwolf@gmail.com>

For invalid extents, find other pages in the same fsblock and write them out.

[pnfsblock: write_begin]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Peng Tao <peng_tao@emc.com>
---
 fs/nfs/blocklayout/blocklayout.c |  275 ++++++++++++++++++++++++++++++++------
 1 files changed, 233 insertions(+), 42 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 502d384..5abdeac 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -35,6 +35,7 @@
 #include <linux/mount.h>
 #include <linux/namei.h>
 #include <linux/bio.h>		/* struct bio */
+#include <linux/buffer_head.h>	/* various write calls */
 
 #include "blocklayout.h"
 
@@ -79,12 +80,8 @@ static int is_hole(struct pnfs_block_extent *be, sector_t isect)
  */
 static int is_writable(struct pnfs_block_extent *be, sector_t isect)
 {
-	if (be->be_state == PNFS_BLOCK_READWRITE_DATA)
-		return 1;
-	else if (be->be_state != PNFS_BLOCK_INVALID_DATA)
-		return 0;
-	else
-		return is_sector_initialized(be->be_inval, isect);
+	return (be->be_state == PNFS_BLOCK_READWRITE_DATA ||
+		be->be_state == PNFS_BLOCK_INVALID_DATA);
 }
 
 static int
@@ -370,6 +367,31 @@ static void mark_extents_written(struct pnfs_block_layout *bl,
 	}
 }
 
+static void bl_end_io_write_zero(struct bio *bio, int err)
+{
+	struct parallel_io *par = bio->bi_private;
+	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
+	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
+	struct nfs_write_data *wdata = (struct nfs_write_data *)par->data;
+
+	do {
+		struct page *page = bvec->bv_page;
+
+		if (--bvec >= bio->bi_io_vec)
+			prefetchw(&bvec->bv_page->flags);
+		/* This is the zeroing page we added */
+		end_page_writeback(page);
+		page_cache_release(page);
+	} while (bvec >= bio->bi_io_vec);
+	if (!uptodate) {
+		if (!wdata->pnfs_error)
+			wdata->pnfs_error = -EIO;
+		bl_set_lo_fail(wdata->lseg);
+	}
+	bio_put(bio);
+	put_parallel(par);
+}
+
 /* This is basically copied from mpage_end_io_read */
 static void bl_end_io_write(struct bio *bio, int err)
 {
@@ -396,11 +418,8 @@ static void bl_write_cleanup(struct work_struct *work)
 	dprintk("%s enter\n", __func__);
 	task = container_of(work, struct rpc_task, u.tk_work);
 	wdata = container_of(task, struct nfs_write_data, task);
-	if (!wdata->task.tk_status) {
+	if (!wdata->pnfs_error) {
 		/* Marks for LAYOUTCOMMIT */
-		/* BUG - this should be called after each bio, not after
-		 * all finish, unless have some way of storing success/failure
-		 */
 		mark_extents_written(BLK_LSEG2EXT(wdata->lseg),
 				     wdata->args.offset, wdata->args.count);
 	}
@@ -408,30 +427,105 @@ static void bl_write_cleanup(struct work_struct *work)
 }
 
 /* Called when last of bios associated with a bl_write_pagelist call finishes */
-static void
-bl_end_par_io_write(void *data)
+static void bl_end_par_io_write(void *data)
 {
 	struct nfs_write_data *wdata = data;
 
-	/* STUB - ignoring error handling */
 	wdata->task.tk_status = 0;
 	wdata->verf.committed = NFS_FILE_SYNC;
 	INIT_WORK(&wdata->task.u.tk_work, bl_write_cleanup);
 	schedule_work(&wdata->task.u.tk_work);
 }
 
+/* FIXME STUB - mark intersection of layout and page as bad, so is not
+ * used again.
+ */
+static void mark_bad_read(void)
+{
+	return;
+}
+
+/*
+ * map_block:  map a requested I/0 block (isect) into an offset in the LVM
+ * block_device
+ */
+static void
+map_block(struct buffer_head *bh, sector_t isect, struct pnfs_block_extent *be)
+{
+	dprintk("%s enter be=%p\n", __func__, be);
+
+	set_buffer_mapped(bh);
+	bh->b_bdev = be->be_mdev;
+	bh->b_blocknr = (isect - be->be_f_offset + be->be_v_offset) >>
+	    (be->be_mdev->bd_inode->i_blkbits - SECTOR_SHIFT);
+
+	dprintk("%s isect %llu, bh->b_blocknr %ld, using bsize %Zd\n",
+		__func__, (unsigned long long)isect, (long)bh->b_blocknr,
+		bh->b_size);
+	return;
+}
+
+/* Given an unmapped page, zero it or read in page for COW, page is locked
+ * by caller.
+ */
+static int
+init_page_for_write(struct page *page, struct pnfs_block_extent *cow_read)
+{
+	struct buffer_head *bh = NULL;
+	int ret = 0;
+	sector_t isect;
+
+	dprintk("%s enter, %p\n", __func__, page);
+	BUG_ON(PageUptodate(page));
+	if (!cow_read) {
+		zero_user_segment(page, 0, PAGE_SIZE);
+		SetPageUptodate(page);
+		goto cleanup;
+	}
+
+	bh = alloc_page_buffers(page, PAGE_CACHE_SIZE, 0);
+	if (!bh) {
+		ret = -ENOMEM;
+		goto cleanup;
+	}
+
+	isect = (sector_t) page->index << PAGE_CACHE_SECTOR_SHIFT;
+	map_block(bh, isect, cow_read);
+	if (!bh_uptodate_or_lock(bh))
+		ret = bh_submit_read(bh);
+	if (ret)
+		goto cleanup;
+	SetPageUptodate(page);
+
+cleanup:
+	bl_put_extent(cow_read);
+	if (bh)
+		free_buffer_head(bh);
+	if (ret) {
+		/* Need to mark layout with bad read...should now
+		 * just use nfs4 for reads and writes.
+		 */
+		mark_bad_read();
+	}
+	return ret;
+}
+
 static enum pnfs_try_status
 bl_write_pagelist(struct nfs_write_data *wdata, int sync)
 {
-	int i;
+	int i, ret, npg_zero, pg_index, last = 0;
 	struct bio *bio = NULL;
-	struct pnfs_block_extent *be = NULL;
-	sector_t isect, extent_length = 0;
+	struct pnfs_block_extent *be = NULL, *cow_read = NULL;
+	sector_t isect, last_isect = 0, extent_length = 0;
 	struct parallel_io *par;
 	loff_t offset = wdata->args.offset;
 	size_t count = wdata->args.count;
 	struct page **pages = wdata->args.pages;
-	int pg_index = wdata->args.pgbase >> PAGE_CACHE_SHIFT;
+	struct page *page;
+	pgoff_t index;
+	u64 temp;
+	int npg_per_block =
+	    NFS_SERVER(wdata->inode)->pnfs_blksize >> PAGE_CACHE_SHIFT;
 
 	dprintk("%s enter, %Zu@%lld\n", __func__, count, offset);
 	if (!wdata->lseg) {
@@ -443,11 +537,8 @@ bl_write_pagelist(struct nfs_write_data *wdata, int sync)
 		return PNFS_NOT_ATTEMPTED;
 	}
 	/* At this point, wdata->pages is a (sequential) list of nfs_pages.
-	 * We want to write each, and if there is an error remove it from
-	 * list and call
-	 * nfs_retry_request(req) to have it redone using nfs.
-	 * QUEST? Do as block or per req?  Think have to do per block
-	 * as part of end_bio
+	 * We want to write each, and if there is an error set pnfs_error
+	 * to have it redone using nfs.
 	 */
 	par = alloc_parallel(wdata);
 	if (!par)
@@ -458,7 +549,91 @@ bl_write_pagelist(struct nfs_write_data *wdata, int sync)
 	/* At this point, have to be more careful with error handling */
 
 	isect = (sector_t) ((offset & (long)PAGE_CACHE_MASK) >> SECTOR_SHIFT);
-	for (i = pg_index; i < wdata->npages ; i++) {
+	be = bl_find_get_extent(BLK_LSEG2EXT(wdata->lseg), isect, &cow_read);
+	if (!be || !is_writable(be, isect)) {
+		dprintk("%s no matching extents!\n", __func__);
+		wdata->pnfs_error = -EINVAL;
+		goto out;
+	}
+
+	/* First page inside INVALID extent */
+	if (be->be_state == PNFS_BLOCK_INVALID_DATA) {
+		temp = offset >> PAGE_CACHE_SHIFT;
+		npg_zero = do_div(temp, npg_per_block);
+		isect = (sector_t) (((offset - npg_zero * PAGE_CACHE_SIZE) &
+				     (long)PAGE_CACHE_MASK) >> SECTOR_SHIFT);
+		extent_length = be->be_length - (isect - be->be_f_offset);
+
+fill_invalid_ext:
+		dprintk("%s need to zero %d pages\n", __func__, npg_zero);
+		for (;npg_zero > 0; npg_zero--) {
+			/* page ref released in bl_end_io_write_zero */
+			index = isect >> PAGE_CACHE_SECTOR_SHIFT;
+			dprintk("%s zero %dth page: index %lu isect %llu\n",
+				__func__, npg_zero, index,
+				(unsigned long long)isect);
+			page =
+			    find_or_create_page(wdata->inode->i_mapping, index,
+						GFP_NOFS);
+			if (!page) {
+				dprintk("%s oom\n", __func__);
+				wdata->pnfs_error = -ENOMEM;
+				goto out;
+			}
+
+			/* PageDirty: Other will write this out
+			 * PageWriteback: Other is writing this out
+			 * PageUptodate: It was read before
+			 * sector_initialized: already written out
+			 */
+			if (PageDirty(page) || PageWriteback(page) ||
+			    is_sector_initialized(be->be_inval, isect)) {
+				print_page(page);
+				unlock_page(page);
+				page_cache_release(page);
+				goto next_page;
+			}
+			if (!PageUptodate(page)) {
+				/* New page, readin or zero it */
+				init_page_for_write(page, cow_read);
+			}
+			set_page_writeback(page);
+			unlock_page(page);
+
+			ret = bl_mark_sectors_init(be->be_inval, isect,
+						       PAGE_CACHE_SECTORS,
+						       NULL);
+			if (unlikely(ret)) {
+				dprintk("%s bl_mark_sectors_init fail %d\n",
+					__func__, ret);
+				end_page_writeback(page);
+				page_cache_release(page);
+				wdata->pnfs_error = ret;
+				goto out;
+			}
+			bio = bl_add_page_to_bio(bio, npg_zero, WRITE,
+						 isect, page, be,
+						 bl_end_io_write_zero, par);
+			if (IS_ERR(bio)) {
+				wdata->pnfs_error = PTR_ERR(bio);
+				goto out;
+			}
+			/* FIXME: This should be done in bi_end_io */
+			mark_extents_written(BLK_LSEG2EXT(wdata->lseg),
+					     page->index << PAGE_CACHE_SHIFT,
+					     PAGE_CACHE_SIZE);
+next_page:
+			isect += PAGE_CACHE_SECTORS;
+			extent_length -= PAGE_CACHE_SECTORS;
+		}
+		if (last)
+			goto write_done;
+	}
+	bio = bl_submit_bio(WRITE, bio);
+
+	/* Middle pages */
+	pg_index = wdata->args.pgbase >> PAGE_CACHE_SHIFT;
+	for (i = pg_index; i < wdata->npages; i++) {
 		if (!extent_length) {
 			/* We've used up the previous extent */
 			bl_put_extent(be);
@@ -467,35 +642,51 @@ bl_write_pagelist(struct nfs_write_data *wdata, int sync)
 			be = bl_find_get_extent(BLK_LSEG2EXT(wdata->lseg),
 					     isect, NULL);
 			if (!be || !is_writable(be, isect)) {
-				wdata->pnfs_error = -ENOMEM;
+				wdata->pnfs_error = -EINVAL;
 				goto out;
 			}
 			extent_length = be->be_length -
-				(isect - be->be_f_offset);
+			    (isect - be->be_f_offset);
 		}
-		for (;;) {
-			if (!bio) {
-				bio = bio_alloc(GFP_NOIO, wdata->npages - i);
-				if (!bio) {
-					wdata->pnfs_error = -ENOMEM;
-					goto out;
-				}
-				bio->bi_sector = isect - be->be_f_offset +
-					be->be_v_offset;
-				bio->bi_bdev = be->be_mdev;
-				bio->bi_end_io = bl_end_io_write;
-				bio->bi_private = par;
+		if (be->be_state == PNFS_BLOCK_INVALID_DATA) {
+			ret = bl_mark_sectors_init(be->be_inval, isect,
+						       PAGE_CACHE_SECTORS,
+						       NULL);
+			if (unlikely(ret)) {
+				dprintk("%s bl_mark_sectors_init fail %d\n",
+					__func__, ret);
+				wdata->pnfs_error = ret;
+				goto out;
 			}
-			if (bio_add_page(bio, pages[i], PAGE_SIZE, 0))
-				break;
-			bio = bl_submit_bio(WRITE, bio);
+		}
+		bio = bl_add_page_to_bio(bio, wdata->npages - i, WRITE,
+					 isect, pages[i], be,
+					 bl_end_io_write, par);
+		if (IS_ERR(bio)) {
+			wdata->pnfs_error = PTR_ERR(bio);
+			goto out;
 		}
 		isect += PAGE_CACHE_SECTORS;
+		last_isect = isect;
 		extent_length -= PAGE_CACHE_SECTORS;
 	}
-	wdata->res.count = (isect << SECTOR_SHIFT) - (offset);
-	if (count < wdata->res.count)
+
+	/* Last page inside INVALID extent */
+	if (be->be_state == PNFS_BLOCK_INVALID_DATA) {
+		bio = bl_submit_bio(WRITE, bio);
+		temp = last_isect >> PAGE_CACHE_SECTOR_SHIFT;
+		npg_zero = npg_per_block - do_div(temp, npg_per_block);
+		if (npg_zero < npg_per_block) {
+			last = 1;
+			goto fill_invalid_ext;
+		}
+	}
+
+write_done:
+	wdata->res.count = (last_isect << SECTOR_SHIFT) - (offset);
+	if (count < wdata->res.count) {
 		wdata->res.count = count;
+	}
 out:
 	bl_put_extent(be);
 	bl_submit_bio(WRITE, bio);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 25/25] NFS41: Drop lseg ref before fallthru to MDS
  2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
                   ` (23 preceding siblings ...)
  2011-07-27 18:40 ` [PATCH v3 24/25] pnfsblock: write_pagelist handle zero invalid extents Jim Rees
@ 2011-07-27 18:40 ` Jim Rees
  2011-07-27 20:16   ` Trond Myklebust
  24 siblings, 1 reply; 55+ messages in thread
From: Jim Rees @ 2011-07-27 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, peter honeyman

From: Peng Tao <bergwolf@gmail.com>

There is no need to keep lseg reference when read/write through MDS.
This fixes a null pointer crash at nfs_post_op_update_inode_force_wcc
because nfs4_proc_write_setup will unset wdata->res.fattr if wdata->lseg
is not NULL.

Signed-off-by: Peng Tao <peng_tao@emc.com>
---
 fs/nfs/pnfs.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 3b20753..fda3019 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1182,6 +1182,9 @@ pnfs_ld_write_done(struct nfs_write_data *data)
 
 	dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__,
 		data->pnfs_error);
+
+	put_lseg(data->lseg);
+	data->lseg = NULL;
 	status = nfs_initiate_write(data, NFS_CLIENT(data->inode),
 				    data->mds_ops, NFS_FILE_SYNC);
 	return status ? : -EAGAIN;
@@ -1282,6 +1285,9 @@ pnfs_ld_read_done(struct nfs_read_data *data)
 
 	dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__,
 		data->pnfs_error);
+
+	put_lseg(data->lseg);
+	data->lseg = NULL;
 	status = nfs_initiate_read(data, NFS_CLIENT(data->inode),
 				   data->mds_ops);
 	return status ? : -EAGAIN;
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 01/25] pnfs: GETDEVICELIST
  2011-07-27 18:40 ` [PATCH v3 01/25] pnfs: GETDEVICELIST Jim Rees
@ 2011-07-27 19:33   ` Trond Myklebust
  2011-07-28 18:27     ` Benny Halevy
  0 siblings, 1 reply; 55+ messages in thread
From: Trond Myklebust @ 2011-07-27 19:33 UTC (permalink / raw)
  To: Jim Rees; +Cc: linux-nfs, peter honeyman

On Wed, 2011-07-27 at 14:40 -0400, Jim Rees wrote: 
> From: Andy Adamson <andros@netapp.com>
> 
> The block driver uses GETDEVICELIST
> 
> Signed-off-by: Andy Adamson <andros@netapp.com>
> [pass struct nfs_server * to getdevicelist]
> [get machince creds for getdevicelist]
> [fix getdevicelist decode sizing]
> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> Signed-off-by: Benny Halevy <bhalevy@tonian.com>
> ---
>  fs/nfs/nfs4proc.c       |   48 ++++++++++++++++++
>  fs/nfs/nfs4xdr.c        |  128 +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/nfs/pnfs.h           |   12 ++++
>  include/linux/nfs4.h    |    1 +
>  include/linux/nfs_xdr.h |   11 ++++
>  5 files changed, 200 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 079614d..ebb6f1a 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -5834,6 +5834,54 @@ int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp)
>  	return status;
>  }
>  
> +/*
> + * Retrieve the list of Data Server devices from the MDS.
> + */
> +static int _nfs4_getdevicelist(struct nfs_server *server,
> +				    const struct nfs_fh *fh,
> +				    struct pnfs_devicelist *devlist)
> +{
> +	struct nfs4_getdevicelist_args args = {
> +		.fh = fh,
> +		.layoutclass = server->pnfs_curr_ld->id,
> +	};
> +	struct nfs4_getdevicelist_res res = {
> +		.devlist = devlist,
> +	};
> +	struct rpc_message msg = {
> +		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_GETDEVICELIST],
> +		.rpc_argp = &args,
> +		.rpc_resp = &res,
> +	};
> +	int status;
> +
> +	dprintk("--> %s\n", __func__);
> +	status = nfs4_call_sync(server->client, server, &msg, &args.seq_args,
> +				&res.seq_res, 0);
> +	dprintk("<-- %s status=%d\n", __func__, status);
> +	return status;
> +}
> +
> +int nfs4_proc_getdevicelist(struct nfs_server *server,
> +			    const struct nfs_fh *fh,
> +			    struct pnfs_devicelist *devlist)
> +{
> +	struct nfs4_exception exception = { };
> +	int err;
> +
> +	do {
> +		err = nfs4_handle_exception(server,
> +				_nfs4_getdevicelist(server, fh, devlist),
> +				&exception);
> +	} while (exception.retry);
> +
> +	dprintk("%s: err=%d, num_devs=%u\n", __func__,
> +		err, devlist->num_devs);
> +
> +	return err;
> +}
> +EXPORT_SYMBOL_GPL(nfs4_proc_getdevicelist);
> +
>  static int
>  _nfs4_proc_getdeviceinfo(struct nfs_server *server, struct pnfs_device *pdev)
>  {
> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> index c191a9b..a82dd40 100644
> --- a/fs/nfs/nfs4xdr.c
> +++ b/fs/nfs/nfs4xdr.c
> @@ -314,6 +314,17 @@ static int nfs4_stat_to_errno(int);
>  				XDR_QUADLEN(NFS4_MAX_SESSIONID_LEN) + 5)
>  #define encode_reclaim_complete_maxsz	(op_encode_hdr_maxsz + 4)
>  #define decode_reclaim_complete_maxsz	(op_decode_hdr_maxsz + 4)
> +#define encode_getdevicelist_maxsz (op_encode_hdr_maxsz + 4 + \
> +				encode_verifier_maxsz)
> +#define decode_getdevicelist_maxsz (op_decode_hdr_maxsz + \
> +				2 /* nfs_cookie4 gdlr_cookie */ + \
> +				decode_verifier_maxsz \
> +				  /* verifier4 gdlr_verifier */ + \
> +				1 /* gdlr_deviceid_list count */ + \
> +				XDR_QUADLEN(NFS4_PNFS_GETDEVLIST_MAXNUM * \
> +					    NFS4_DEVICEID4_SIZE) \
> +				  /* gdlr_deviceid_list */ + \
> +				1 /* bool gdlr_eof */)
>  #define encode_getdeviceinfo_maxsz (op_encode_hdr_maxsz + 4 + \
>  				XDR_QUADLEN(NFS4_DEVICEID4_SIZE))
>  #define decode_getdeviceinfo_maxsz (op_decode_hdr_maxsz + \
> @@ -748,6 +759,14 @@ static int nfs4_stat_to_errno(int);
>  #define NFS4_dec_reclaim_complete_sz	(compound_decode_hdr_maxsz + \
>  					 decode_sequence_maxsz + \
>  					 decode_reclaim_complete_maxsz)
> +#define NFS4_enc_getdevicelist_sz (compound_encode_hdr_maxsz + \
> +				encode_sequence_maxsz + \
> +				encode_putfh_maxsz + \
> +				encode_getdevicelist_maxsz)
> +#define NFS4_dec_getdevicelist_sz (compound_decode_hdr_maxsz + \
> +				decode_sequence_maxsz + \
> +				decode_putfh_maxsz + \
> +				decode_getdevicelist_maxsz)
>  #define NFS4_enc_getdeviceinfo_sz (compound_encode_hdr_maxsz +    \
>  				encode_sequence_maxsz +\
>  				encode_getdeviceinfo_maxsz)
> @@ -1855,6 +1874,26 @@ static void encode_sequence(struct xdr_stream *xdr,
>  
>  #ifdef CONFIG_NFS_V4_1
>  static void
> +encode_getdevicelist(struct xdr_stream *xdr,
> +		     const struct nfs4_getdevicelist_args *args,
> +		     struct compound_hdr *hdr)
> +{
> +	__be32 *p;
> +	nfs4_verifier dummy = {
> +		.data = "dummmmmy",
> +	};
> +
> +	p = reserve_space(xdr, 20);
> +	*p++ = cpu_to_be32(OP_GETDEVICELIST);
> +	*p++ = cpu_to_be32(args->layoutclass);
> +	*p++ = cpu_to_be32(NFS4_PNFS_GETDEVLIST_MAXNUM);
> +	xdr_encode_hyper(p, 0ULL);                          /* cookie */
> +	encode_nfs4_verifier(xdr, &dummy);
> +	hdr->nops++;
> +	hdr->replen += decode_getdevicelist_maxsz;
> +}
> +
> +static void
>  encode_getdeviceinfo(struct xdr_stream *xdr,
>  		     const struct nfs4_getdeviceinfo_args *args,
>  		     struct compound_hdr *hdr)
> @@ -2775,6 +2814,24 @@ static void nfs4_xdr_enc_reclaim_complete(struct rpc_rqst *req,
>  }
>  
>  /*
> + * Encode GETDEVICELIST request
> + */
> +static void nfs4_xdr_enc_getdevicelist(struct rpc_rqst *req,
> +				       struct xdr_stream *xdr,
> +				       struct nfs4_getdevicelist_args *args)
> +{
> +	struct compound_hdr hdr = {
> +		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
> +	};
> +
> +	encode_compound_hdr(xdr, req, &hdr);
> +	encode_sequence(xdr, &args->seq_args, &hdr);
> +	encode_putfh(xdr, args->fh, &hdr);
> +	encode_getdevicelist(xdr, args, &hdr);
> +	encode_nops(&hdr);
> +}
> +
> +/*
>   * Encode GETDEVICEINFO request
>   */
>  static void nfs4_xdr_enc_getdeviceinfo(struct rpc_rqst *req,
> @@ -5268,6 +5325,50 @@ out_overflow:
>  }
>  
>  #if defined(CONFIG_NFS_V4_1)
> +/*
> + * TODO: Need to handle case when EOF != true;
> + */
> +static int decode_getdevicelist(struct xdr_stream *xdr,
> +				struct pnfs_devicelist *res)
> +{
> +	__be32 *p;
> +	int status, i;
> +	struct nfs_writeverf verftemp;
> +
> +	status = decode_op_hdr(xdr, OP_GETDEVICELIST);
> +	if (status)
> +		return status;
> +
> +	p = xdr_inline_decode(xdr, 8 + 8 + 4);
> +	if (unlikely(!p))
> +		goto out_overflow;
> +
> +	/* TODO: Skip cookie for now */
> +	p += 2;
> +
> +	/* Read verifier */
> +	p = xdr_decode_opaque_fixed(p, verftemp.verifier, 8);
> +
> +	res->num_devs = be32_to_cpup(p);
> +
> +	dprintk("%s: num_dev %d\n", __func__, res->num_devs);
> +
> +	if (res->num_devs > NFS4_PNFS_GETDEVLIST_MAXNUM)
> +		return -NFS4ERR_REP_TOO_BIG;

The client shouldn't ever be using NFS4ERR_REP_TOO_BIG (or any other NFS
protocol error) to pass messages to itself. For one thing, we can't map
that to a POSIX error, and secondly its use above conflicts with the
protocol meaning of the same error.

So please change the above to 'EIO', and perhaps add a ratelimited
printk() so that syslog can capture why the error is happening.


> @@ -6902,6 +7029,7 @@ struct rpc_procinfo	nfs4_procedures[] = {
>  	PROC(GET_LEASE_TIME,	enc_get_lease_time,	dec_get_lease_time),
>  	PROC(RECLAIM_COMPLETE,	enc_reclaim_complete,	dec_reclaim_complete),
>  	PROC(GETDEVICEINFO,	enc_getdeviceinfo,	dec_getdeviceinfo),
> +	PROC(GETDEVICELIST,	enc_getdevicelist,	dec_getdevicelist),

This needs to go at the end of the list of procedures in order to work
correctly with nfsstat.

> 	PROC(LAYOUTGET,		enc_layoutget,		dec_layoutget),
>  	PROC(LAYOUTCOMMIT,	enc_layoutcommit,	dec_layoutcommit),
>  	PROC(LAYOUTRETURN,	enc_layoutreturn,	dec_layoutreturn),


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 02/25] pnfs: add set-clear layoutdriver interface
  2011-07-27 18:40 ` [PATCH v3 02/25] pnfs: add set-clear layoutdriver interface Jim Rees
@ 2011-07-27 19:36   ` Trond Myklebust
  0 siblings, 0 replies; 55+ messages in thread
From: Trond Myklebust @ 2011-07-27 19:36 UTC (permalink / raw)
  To: Jim Rees; +Cc: linux-nfs, peter honeyman

On Wed, 2011-07-27 at 14:40 -0400, Jim Rees wrote: 
> From: Benny Halevy <bhalevy@panasas.com>
> 
> To allow layout driver to issue getdevicelist at mount time, and clean up
> at umount time.
> 
> [fixup non NFS_V4_1 set_pnfs_layoutdriver definition]
> [pnfs: pass mntfh down the init_pnfs path]
> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> Signed-off-by: Benny Halevy <bhalevy@tonian.com>
> ---
>  fs/nfs/client.c |    8 +++++---
>  fs/nfs/pnfs.c   |   16 ++++++++++++++--
>  fs/nfs/pnfs.h   |    8 ++++++--
>  3 files changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index 19ea7d9..a9b1848 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -904,7 +904,9 @@ error:
>  /*
>   * Load up the server record from information gained in an fsinfo record
>   */
> -static void nfs_server_set_fsinfo(struct nfs_server *server, struct nfs_fsinfo *fsinfo)
> +static void nfs_server_set_fsinfo(struct nfs_server *server,
> +				  struct nfs_fh *mntfh,
> +				  struct nfs_fsinfo *fsinfo)
>  {
>  	unsigned long max_rpc_payload;
>  
> @@ -934,7 +936,7 @@ static void nfs_server_set_fsinfo(struct nfs_server *server, struct nfs_fsinfo *
>  	if (server->wsize > NFS_MAX_FILE_IO_SIZE)
>  		server->wsize = NFS_MAX_FILE_IO_SIZE;
>  	server->wpages = (server->wsize + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
> -	set_pnfs_layoutdriver(server, fsinfo->layouttype);
> +	set_pnfs_layoutdriver(server, mntfh, fsinfo->layouttype);
>  
>  	server->wtmult = nfs_block_bits(fsinfo->wtmult, NULL);
>  
> @@ -980,7 +982,7 @@ static int nfs_probe_fsinfo(struct nfs_server *server, struct nfs_fh *mntfh, str
>  	if (error < 0)
>  		goto out_error;
>  
> -	nfs_server_set_fsinfo(server, &fsinfo);
> +	nfs_server_set_fsinfo(server, mntfh, &fsinfo);
>  
>  	/* Get some general file system info */
>  	if (server->namelen == 0) {
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 38e5508..8e72724 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -76,8 +76,11 @@ find_pnfs_driver(u32 id)
>  void
>  unset_pnfs_layoutdriver(struct nfs_server *nfss)
>  {
> -	if (nfss->pnfs_curr_ld)
> +	if (nfss->pnfs_curr_ld) {
> +		if (nfss->pnfs_curr_ld->clear_layoutdriver)
> +			nfss->pnfs_curr_ld->clear_layoutdriver(nfss);
>  		module_put(nfss->pnfs_curr_ld->owner);
> +	}
>  	nfss->pnfs_curr_ld = NULL;
>  }
>  
> @@ -88,7 +91,8 @@ unset_pnfs_layoutdriver(struct nfs_server *nfss)
>   * @id layout type. Zero (illegal layout type) indicates pNFS not in use.
>   */
>  void
> -set_pnfs_layoutdriver(struct nfs_server *server, u32 id)
> +set_pnfs_layoutdriver(struct nfs_server *server, const struct nfs_fh *mntfh,
> +		      u32 id)
>  {
>  	struct pnfs_layoutdriver_type *ld_type = NULL;
>  
> @@ -115,6 +119,14 @@ set_pnfs_layoutdriver(struct nfs_server *server, u32 id)
>  		goto out_no_driver;
>  	}
>  	server->pnfs_curr_ld = ld_type;
> +	if (ld_type->set_layoutdriver
> +	    && ld_type->set_layoutdriver(server, mntfh)) {
> +		printk(KERN_ERR
> +		       "%s: Error initializing mount point for layout driver %u.\n",

Please fix this error message: as it stands, it doesn't make sense.



-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 04/25] NFS41: save layoutcommit cred after first successful layoutget
  2011-07-27 18:40 ` [PATCH v3 04/25] NFS41: save layoutcommit cred after first successful layoutget Jim Rees
@ 2011-07-27 19:47   ` Trond Myklebust
  2011-07-28  3:29     ` tao.peng
  0 siblings, 1 reply; 55+ messages in thread
From: Trond Myklebust @ 2011-07-27 19:47 UTC (permalink / raw)
  To: Jim Rees; +Cc: linux-nfs, peter honeyman

On Wed, 2011-07-27 at 14:40 -0400, Jim Rees wrote: 
> From: Peng Tao <bergwolf@gmail.com>
> 
> Save it in layout header instead of in every segment.
> 
> Signed-off-by: Peng Tao <peng_tao@emc.com>
> ---
>  fs/nfs/pnfs.c |   12 ++++++------
>  fs/nfs/pnfs.h |    1 -
>  2 files changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index dae19dd..fbebd2a 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -211,6 +211,7 @@ destroy_layout_hdr(struct pnfs_layout_hdr *lo)
>  	dprintk("%s: freeing layout cache %p\n", __func__, lo);
>  	BUG_ON(!list_empty(&lo->plh_layouts));
>  	NFS_I(lo->plh_inode)->layout = NULL;
> +	put_rpccred(lo->plh_lc_cred);
>  	pnfs_free_layout_hdr(lo);
>  }
>  
> @@ -1007,6 +1008,10 @@ pnfs_update_layout(struct inode *ino,
>  		list_del_init(&lo->plh_layouts);
>  		spin_unlock(&clp->cl_lock);
>  	}
> +	if (first) {
> +		lo->plh_lc_cred =
> +			get_rpccred(ctx->state->owner->so_cred);
> +	}

Hmm... This whole 'first' thing is inherently racy. What if two threads
are calling pnfs_update_layout at the same time: they both see
list_empty(&lo->plh_segs), and both conclude that they need to
initialise the layout header. _BOOM_....

The correct thing to do here is either to initialise the layout header
correctly inside pnfs_find_alloc_layout(), or to add a flag for
->plh_flags that can be used with test_and_clear_bit() to get _one_ of
the processes to perform the initialisation.
Personally, I'd prefer the former.


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 06/25] pnfs: cleanup_layoutcommit
  2011-07-27 18:40 ` [PATCH v3 06/25] pnfs: cleanup_layoutcommit Jim Rees
@ 2011-07-27 19:53   ` Trond Myklebust
  2011-07-27 20:20     ` Boaz Harrosh
  0 siblings, 1 reply; 55+ messages in thread
From: Trond Myklebust @ 2011-07-27 19:53 UTC (permalink / raw)
  To: Jim Rees; +Cc: linux-nfs, peter honeyman

On Wed, 2011-07-27 at 14:40 -0400, Jim Rees wrote: 
> From: Andy Adamson <andros@netapp.com>
> 
> This gives layout driver a chance to cleanup structures they put in at
> encode_layoutcommit.
> 
> Signed-off-by: Andy Adamson <andros@netapp.com>
> [fixup layout header pointer for layoutcommit]
> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> Signed-off-by: Benny Halevy <bhalevy@tonian.com>
> ---
>  fs/nfs/nfs4proc.c       |    1 +
>  fs/nfs/nfs4xdr.c        |    1 +
>  fs/nfs/pnfs.c           |   10 ++++++++++
>  fs/nfs/pnfs.h           |    5 +++++
>  include/linux/nfs_xdr.h |    1 +
>  5 files changed, 18 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index e86de79..6cb84b4 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -5963,6 +5963,7 @@ static void nfs4_layoutcommit_release(void *calldata)
>  	struct nfs4_layoutcommit_data *data = calldata;
>  	struct pnfs_layout_segment *lseg, *tmp;
>  
> +	pnfs_cleanup_layoutcommit(data->args.inode, data);
>  	/* Matched by references in pnfs_set_layoutcommit */
>  	list_for_each_entry_safe(lseg, tmp, &data->lseg_list, pls_lc_list) {
>  		list_del_init(&lseg->pls_lc_list);
> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> index f32dde9..ea2da30 100644
> --- a/fs/nfs/nfs4xdr.c
> +++ b/fs/nfs/nfs4xdr.c
> @@ -5596,6 +5596,7 @@ static int decode_layoutcommit(struct xdr_stream *xdr,
>  	int status;
>  
>  	status = decode_op_hdr(xdr, OP_LAYOUTCOMMIT);
> +	res->status = status;
>  	if (status)
>  		return status;
>  
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index fbebd2a..3b20753 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -1411,6 +1411,16 @@ pnfs_set_layoutcommit(struct nfs_write_data *wdata)
>  }
>  EXPORT_SYMBOL_GPL(pnfs_set_layoutcommit);
>  
> +void pnfs_cleanup_layoutcommit(struct inode *inode,
> +			       struct nfs4_layoutcommit_data *data)
> +{
> +	struct nfs_server *nfss = NFS_SERVER(inode);
> +
> +	if (nfss->pnfs_curr_ld->cleanup_layoutcommit)
> +		nfss->pnfs_curr_ld->cleanup_layoutcommit(NFS_I(inode)->layout,
> +							 data);
> +}
> +
>  /*
>   * For the LAYOUT4_NFSV4_1_FILES layout type, NFS_DATA_SYNC WRITEs and
>   * NFS_UNSTABLE WRITEs with a COMMIT to data servers must store enough
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index bddd8b9..f271425 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -113,6 +113,9 @@ struct pnfs_layoutdriver_type {
>  				     struct xdr_stream *xdr,
>  				     const struct nfs4_layoutreturn_args *args);
>  
> +	void (*cleanup_layoutcommit) (struct pnfs_layout_hdr *layoutid,
> +				      struct nfs4_layoutcommit_data *data);
> +
>  	void (*encode_layoutcommit) (struct pnfs_layout_hdr *layoutid,
>  				     struct xdr_stream *xdr,
>  				     const struct nfs4_layoutcommit_args *args);

This really needs to go. We should have

      int (*layoutcommit)()...

instead of 'encode' and 'cleanup' methods...

> @@ -196,6 +199,8 @@ void pnfs_roc_release(struct inode *ino);
>  void pnfs_roc_set_barrier(struct inode *ino, u32 barrier);
>  bool pnfs_roc_drain(struct inode *ino, u32 *barrier);
>  void pnfs_set_layoutcommit(struct nfs_write_data *wdata);
> +void pnfs_cleanup_layoutcommit(struct inode *inode,
> +			       struct nfs4_layoutcommit_data *data);
>  int pnfs_layoutcommit_inode(struct inode *inode, bool sync);
>  int _pnfs_return_layout(struct inode *);
>  int pnfs_ld_write_done(struct nfs_write_data *);
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index 94f27e5..569ea5b 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -269,6 +269,7 @@ struct nfs4_layoutcommit_res {
>  	struct nfs_fattr *fattr;
>  	const struct nfs_server *server;
>  	struct nfs4_sequence_res seq_res;
> +	int status;
>  };
>  
>  struct nfs4_layoutcommit_data {

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 03/25] NFS41: Let layoutcommit handle multiple segments
  2011-07-27 18:40 ` [PATCH v3 03/25] NFS41: Let layoutcommit handle multiple segments Jim Rees
@ 2011-07-27 20:05   ` Boaz Harrosh
  2011-07-27 20:22     ` Trond Myklebust
  0 siblings, 1 reply; 55+ messages in thread
From: Boaz Harrosh @ 2011-07-27 20:05 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jim Rees, linux-nfs, peter honeyman

On 07/27/2011 11:40 AM, Jim Rees wrote:
> From: Peng Tao <bergwolf@gmail.com>
> 
> Some layout drivers like block will have multiple segments.  Generic code
> should be able to handle it.  Layoutcommit takes a list of segments and last
> write offset is saved at inode level.
> 
> Signed-off-by: Peng Tao <peng_tao@emc.com>

Trond Hi

Could you please add this patch to the current Merge window? And also include

CC: Stable <stable@vger.kernel.org>

Without it the objects-layout-driver together with current pnfs-obj-servers (eg exofs)
does not work. I have posted a permutation of this patch back in bakeathon and have
successfully run tests only with this one included.

Also the next patch:
	[PATCH v3 04/25] NFS41: save layoutcommit cred after first successful layoutget

can be added since they are closely related. I've usually ran with both of them applied.

I think they both apply cleanly on a v3.0 Kernel

(I have two more BUGFIX patches for obj-layout that need the same, I'll send them later
 today)

Thanks
Boaz

> ---
>  fs/nfs/nfs4filelayout.c |    2 +-
>  fs/nfs/nfs4proc.c       |    8 ++++++-
>  fs/nfs/nfs4xdr.c        |    2 +-
>  fs/nfs/pnfs.c           |   50 ++++++++++++++++++++++++----------------------
>  fs/nfs/pnfs.h           |    5 +++-
>  include/linux/nfs_xdr.h |    2 +-
>  6 files changed, 40 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
> index be93a62..e8915d4 100644
> --- a/fs/nfs/nfs4filelayout.c
> +++ b/fs/nfs/nfs4filelayout.c
> @@ -170,7 +170,7 @@ filelayout_set_layoutcommit(struct nfs_write_data *wdata)
>  
>  	pnfs_set_layoutcommit(wdata);
>  	dprintk("%s ionde %lu pls_end_pos %lu\n", __func__, wdata->inode->i_ino,
> -		(unsigned long) wdata->lseg->pls_end_pos);
> +		(unsigned long) NFS_I(wdata->inode)->layout->plh_lwb);
>  }
>  
>  /*
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index ebb6f1a..af32d3d 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -5960,9 +5960,15 @@ nfs4_layoutcommit_done(struct rpc_task *task, void *calldata)
>  static void nfs4_layoutcommit_release(void *calldata)
>  {
>  	struct nfs4_layoutcommit_data *data = calldata;
> +	struct pnfs_layout_segment *lseg, *tmp;
>  
>  	/* Matched by references in pnfs_set_layoutcommit */
> -	put_lseg(data->lseg);
> +	list_for_each_entry_safe(lseg, tmp, &data->lseg_list, pls_lc_list) {
> +		list_del_init(&lseg->pls_lc_list);
> +		if (test_and_clear_bit(NFS_LSEG_LAYOUTCOMMIT,
> +				       &lseg->pls_flags))
> +			put_lseg(lseg);
> +	}
>  	put_rpccred(data->cred);
>  	kfree(data);
>  }
> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> index a82dd40..c149cbe 100644
> --- a/fs/nfs/nfs4xdr.c
> +++ b/fs/nfs/nfs4xdr.c
> @@ -1955,7 +1955,7 @@ encode_layoutcommit(struct xdr_stream *xdr,
>  	*p++ = cpu_to_be32(OP_LAYOUTCOMMIT);
>  	/* Only whole file layouts */
>  	p = xdr_encode_hyper(p, 0); /* offset */
> -	p = xdr_encode_hyper(p, NFS4_MAX_UINT64); /* length */
> +	p = xdr_encode_hyper(p, args->lastbytewritten + 1);	/* length */
>  	*p++ = cpu_to_be32(0); /* reclaim */
>  	p = xdr_encode_opaque_fixed(p, args->stateid.data, NFS4_STATEID_SIZE);
>  	*p++ = cpu_to_be32(1); /* newoffset = TRUE */
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 8e72724..dae19dd 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -236,6 +236,7 @@ static void
>  init_lseg(struct pnfs_layout_hdr *lo, struct pnfs_layout_segment *lseg)
>  {
>  	INIT_LIST_HEAD(&lseg->pls_list);
> +	INIT_LIST_HEAD(&lseg->pls_lc_list);
>  	atomic_set(&lseg->pls_refcount, 1);
>  	smp_mb();
>  	set_bit(NFS_LSEG_VALID, &lseg->pls_flags);
> @@ -1362,16 +1363,17 @@ pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc)
>  EXPORT_SYMBOL_GPL(pnfs_generic_pg_readpages);
>  
>  /*
> - * Currently there is only one (whole file) write lseg.
> + * There can be multiple RW segments.
>   */
> -static struct pnfs_layout_segment *pnfs_list_write_lseg(struct inode *inode)
> +static void pnfs_list_write_lseg(struct inode *inode, struct list_head *listp)
>  {
> -	struct pnfs_layout_segment *lseg, *rv = NULL;
> +	struct pnfs_layout_segment *lseg;
>  
> -	list_for_each_entry(lseg, &NFS_I(inode)->layout->plh_segs, pls_list)
> -		if (lseg->pls_range.iomode == IOMODE_RW)
> -			rv = lseg;
> -	return rv;
> +	list_for_each_entry(lseg, &NFS_I(inode)->layout->plh_segs, pls_list) {
> +		if (lseg->pls_range.iomode == IOMODE_RW &&
> +		    test_bit(NFS_LSEG_LAYOUTCOMMIT, &lseg->pls_flags))
> +			list_add(&lseg->pls_lc_list, listp);
> +	}
>  }
>  
>  void
> @@ -1383,17 +1385,21 @@ pnfs_set_layoutcommit(struct nfs_write_data *wdata)
>  
>  	spin_lock(&nfsi->vfs_inode.i_lock);
>  	if (!test_and_set_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags)) {
> -		/* references matched in nfs4_layoutcommit_release */
> -		get_lseg(wdata->lseg);
> -		wdata->lseg->pls_lc_cred =
> -			get_rpccred(wdata->args.context->state->owner->so_cred);
>  		mark_as_dirty = true;
> +		nfsi->layout->plh_lc_cred =
> +			get_rpccred(wdata->args.context->state->owner->so_cred);
>  		dprintk("%s: Set layoutcommit for inode %lu ",
>  			__func__, wdata->inode->i_ino);
>  	}
> -	if (end_pos > wdata->lseg->pls_end_pos)
> -		wdata->lseg->pls_end_pos = end_pos;
> +	if (!test_and_set_bit(NFS_LSEG_LAYOUTCOMMIT, &wdata->lseg->pls_flags)) {
> +		/* references matched in nfs4_layoutcommit_release */
> +		get_lseg(wdata->lseg);
> +	}
> +	if (end_pos > nfsi->layout->plh_lwb)
> +		nfsi->layout->plh_lwb = end_pos;
>  	spin_unlock(&nfsi->vfs_inode.i_lock);
> +	dprintk("%s: lseg %p end_pos %llu\n",
> +		__func__, wdata->lseg, nfsi->layout->plh_lwb);
>  
>  	/* if pnfs_layoutcommit_inode() runs between inode locks, the next one
>  	 * will be a noop because NFS_INO_LAYOUTCOMMIT will not be set */
> @@ -1415,7 +1421,6 @@ pnfs_layoutcommit_inode(struct inode *inode, bool sync)
>  {
>  	struct nfs4_layoutcommit_data *data;
>  	struct nfs_inode *nfsi = NFS_I(inode);
> -	struct pnfs_layout_segment *lseg;
>  	struct rpc_cred *cred;
>  	loff_t end_pos;
>  	int status = 0;
> @@ -1433,29 +1438,26 @@ pnfs_layoutcommit_inode(struct inode *inode, bool sync)
>  		goto out;
>  	}
>  
> +	INIT_LIST_HEAD(&data->lseg_list);
>  	spin_lock(&inode->i_lock);
>  	if (!test_and_clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags)) {
>  		spin_unlock(&inode->i_lock);
>  		kfree(data);
>  		goto out;
>  	}
> -	/*
> -	 * Currently only one (whole file) write lseg which is referenced
> -	 * in pnfs_set_layoutcommit and will be found.
> -	 */
> -	lseg = pnfs_list_write_lseg(inode);
>  
> -	end_pos = lseg->pls_end_pos;
> -	cred = lseg->pls_lc_cred;
> -	lseg->pls_end_pos = 0;
> -	lseg->pls_lc_cred = NULL;
> +	pnfs_list_write_lseg(inode, &data->lseg_list);
> +
> +	end_pos = nfsi->layout->plh_lwb;
> +	cred = nfsi->layout->plh_lc_cred;
> +	nfsi->layout->plh_lwb = 0;
> +	nfsi->layout->plh_lc_cred = NULL;
>  
>  	memcpy(&data->args.stateid.data, nfsi->layout->plh_stateid.data,
>  		sizeof(nfsi->layout->plh_stateid.data));
>  	spin_unlock(&inode->i_lock);
>  
>  	data->args.inode = inode;
> -	data->lseg = lseg;
>  	data->cred = cred;
>  	nfs_fattr_init(&data->fattr);
>  	data->args.bitmask = NFS_SERVER(inode)->cache_consistency_bitmask;
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index 23d8267..5f1b532 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -36,16 +36,17 @@
>  enum {
>  	NFS_LSEG_VALID = 0,	/* cleared when lseg is recalled/returned */
>  	NFS_LSEG_ROC,		/* roc bit received from server */
> +	NFS_LSEG_LAYOUTCOMMIT,	/* layoutcommit bit set for layoutcommit */
>  };
>  
>  struct pnfs_layout_segment {
>  	struct list_head pls_list;
> +	struct list_head pls_lc_list;
>  	struct pnfs_layout_range pls_range;
>  	atomic_t pls_refcount;
>  	unsigned long pls_flags;
>  	struct pnfs_layout_hdr *pls_layout;
>  	struct rpc_cred	*pls_lc_cred; /* LAYOUTCOMMIT credential */
> -	loff_t pls_end_pos; /* LAYOUTCOMMIT write end */
>  };
>  
>  enum pnfs_try_status {
> @@ -128,6 +129,8 @@ struct pnfs_layout_hdr {
>  	unsigned long		plh_block_lgets; /* block LAYOUTGET if >0 */
>  	u32			plh_barrier; /* ignore lower seqids */
>  	unsigned long		plh_flags;
> +	loff_t			plh_lwb; /* last write byte for layoutcommit */
> +	struct rpc_cred		*plh_lc_cred; /* layoutcommit cred */
>  	struct inode		*plh_inode;
>  };
>  
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index a07b682..21f333e 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -273,7 +273,7 @@ struct nfs4_layoutcommit_res {
>  struct nfs4_layoutcommit_data {
>  	struct rpc_task task;
>  	struct nfs_fattr fattr;
> -	struct pnfs_layout_segment *lseg;
> +	struct list_head lseg_list;
>  	struct rpc_cred *cred;
>  	struct nfs4_layoutcommit_args args;
>  	struct nfs4_layoutcommit_res res;


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 20/25] pnfsblock: bl_read_pagelist
  2011-07-27 18:40 ` [PATCH v3 20/25] pnfsblock: bl_read_pagelist Jim Rees
@ 2011-07-27 20:09   ` Trond Myklebust
  0 siblings, 0 replies; 55+ messages in thread
From: Trond Myklebust @ 2011-07-27 20:09 UTC (permalink / raw)
  To: Jim Rees; +Cc: linux-nfs, peter honeyman

On Wed, 2011-07-27 at 14:40 -0400, Jim Rees wrote: 
> From: Fred Isaman <iisaman@citi.umich.edu>
> 
> Note: When upper layer's read/write request cannot be fulfilled, the block
> layout driver shouldn't silently mark the page as error. It should do
> what can be done and  leave the rest to the upper layer. To do so, we
> should set rdata/wdata->res.count properly.
> 
> When upper layer re-send the read/write request to finish the rest
> part of the request, pgbase is the position where we should start at.
> 
> [pnfsblock: mark IO error with NFS_LAYOUT_{RW|RO}_FAILED]
> Signed-off-by: Peng Tao <peng_tao@emc.com>
> [pnfsblock: read path error handling]
> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
> [pnfsblock: handle errors when read or write pagelist.]
> Signed-off-by: Zhang Jingwang <yyalone@gmail.com>
> [pnfs-block: use new read_pagelist api]
> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> Signed-off-by: Benny Halevy <bhalevy@tonian.com>
> ---
>  fs/nfs/blocklayout/blocklayout.c |  282 ++++++++++++++++++++++++++++++++++++++
>  1 files changed, 282 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> index c63296d..6e90371 100644
> --- a/fs/nfs/blocklayout/blocklayout.c
> +++ b/fs/nfs/blocklayout/blocklayout.c
> @@ -29,10 +29,12 @@
>   * of the software, even if it has been or is hereafter advised of the
>   * possibility of such damages.
>   */
> +
>  #include <linux/module.h>
>  #include <linux/init.h>
>  #include <linux/mount.h>
>  #include <linux/namei.h>
> +#include <linux/bio.h>		/* struct bio */
>  
>  #include "blocklayout.h"
>  
> @@ -45,9 +47,289 @@ MODULE_DESCRIPTION("The NFSv4.1 pNFS Block layout driver");
>  struct dentry *bl_device_pipe;
>  wait_queue_head_t bl_wq;
>  
> +static void print_page(struct page *page)
> +{
> +	dprintk("PRINTPAGE page %p\n", page);
> +	dprintk("	PagePrivate %d\n", PagePrivate(page));
> +	dprintk("	PageUptodate %d\n", PageUptodate(page));
> +	dprintk("	PageError %d\n", PageError(page));
> +	dprintk("	PageDirty %d\n", PageDirty(page));
> +	dprintk("	PageReferenced %d\n", PageReferenced(page));
> +	dprintk("	PageLocked %d\n", PageLocked(page));
> +	dprintk("	PageWriteback %d\n", PageWriteback(page));
> +	dprintk("	PageMappedToDisk %d\n", PageMappedToDisk(page));
> +	dprintk("\n");
> +}
> +
> +/* Given the be associated with isect, determine if page data needs to be
> + * initialized.
> + */
> +static int is_hole(struct pnfs_block_extent *be, sector_t isect)
> +{
> +	if (be->be_state == PNFS_BLOCK_NONE_DATA)
> +		return 1;
> +	else if (be->be_state != PNFS_BLOCK_INVALID_DATA)
> +		return 0;
> +	else
> +		return !is_sector_initialized(be->be_inval, isect);
> +}
> +
> +static int
> +dont_like_caller(struct nfs_page *req)
> +{
> +	if (atomic_read(&req->wb_complete)) {
> +		/* Called by _multi */
> +		return 1;
> +	} else {
> +		/* Called by _one */
> +		return 0;
> +	}
> +}

Why is this still needed? If everything was set up correctly in the
nfs_pageio_ops, then you should never get to bl_read_pagelist if you
don't like the block size. 


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 21/25] pnfsblock: bl_write_pagelist
  2011-07-27 18:40 ` [PATCH v3 21/25] pnfsblock: bl_write_pagelist Jim Rees
@ 2011-07-27 20:11   ` Trond Myklebust
  0 siblings, 0 replies; 55+ messages in thread
From: Trond Myklebust @ 2011-07-27 20:11 UTC (permalink / raw)
  To: Jim Rees; +Cc: linux-nfs, peter honeyman

On Wed, 2011-07-27 at 14:40 -0400, Jim Rees wrote: 
> From: Fred Isaman <iisaman@citi.umich.edu>
> 
> Note: When upper layer's read/write request cannot be fulfilled, the block
> layout driver shouldn't silently mark the page as error. It should do
> what can be done and  leave the rest to the upper layer. To do so, we
> should set rdata/wdata->res.count properly.
> 
> When upper layer re-send the read/write request to finish the rest
> part of the request, pgbase is the position where we should start at.
> 
> [pnfsblock: bl_write_pagelist support functions]
> [pnfsblock: bl_write_pagelist adjust for missing PG_USE_PNFS]
> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
> [pnfsblock: handle errors when read or write pagelist.]
> Signed-off-by: Zhang Jingwang <yyalone@gmail.com>
> [pnfs-block: use new write_pagelist api]
> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> Signed-off-by: Benny Halevy <bhalevy@tonian.com>
> 
> [SQUASHME: pnfsblock: mds_offset is set in the generic layer]
> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
> Signed-off-by: Benny Halevy <bhalevy@tonian.com>
> 
> [pnfsblock: mark IO error with NFS_LAYOUT_{RW|RO}_FAILED]
> Signed-off-by: Peng Tao <peng_tao@emc.com>
> [pnfsblock: SQUASHME: adjust to API change]
> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
> [pnfsblock: fixup blksize alignment in bl_setup_layoutcommit]
> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> Signed-off-by: Benny Halevy <bhalevy@tonian.com>
> [pnfsblock: bl_write_pagelist adjust for missing PG_USE_PNFS]
> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
> [pnfsblock: handle errors when read or write pagelist.]
> Signed-off-by: Zhang Jingwang <yyalone@gmail.com>
> [pnfs-block: use new write_pagelist api]
> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> Signed-off-by: Benny Halevy <bhalevy@tonian.com>
> ---
>  fs/nfs/blocklayout/blocklayout.c |  137 +++++++++++++++++++++++++++++++++++++-
>  1 files changed, 134 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> index 6e90371..963b3ea 100644
> --- a/fs/nfs/blocklayout/blocklayout.c
> +++ b/fs/nfs/blocklayout/blocklayout.c
> @@ -74,6 +74,19 @@ static int is_hole(struct pnfs_block_extent *be, sector_t isect)
>  		return !is_sector_initialized(be->be_inval, isect);
>  }
>  
> +/* Given the be associated with isect, determine if page data can be
> + * written to disk.
> + */
> +static int is_writable(struct pnfs_block_extent *be, sector_t isect)
> +{
> +	if (be->be_state == PNFS_BLOCK_READWRITE_DATA)
> +		return 1;
> +	else if (be->be_state != PNFS_BLOCK_INVALID_DATA)
> +		return 0;
> +	else
> +		return is_sector_initialized(be->be_inval, isect);
> +}
> +
>  static int
>  dont_like_caller(struct nfs_page *req)
>  {
> @@ -333,11 +346,129 @@ out:
>  	return PNFS_NOT_ATTEMPTED;
>  }
>  
> +/* This is basically copied from mpage_end_io_read */
> +static void bl_end_io_write(struct bio *bio, int err)
> +{
> +	struct parallel_io *par = bio->bi_private;
> +	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
> +	struct nfs_write_data *wdata = (struct nfs_write_data *)par->data;
> +
> +	if (!uptodate) {
> +		if (!wdata->pnfs_error)
> +			wdata->pnfs_error = -EIO;
> +		bl_set_lo_fail(wdata->lseg);
> +	}
> +	bio_put(bio);
> +	put_parallel(par);
> +}
> +
> +/* Function scheduled for call during bl_end_par_io_write,
> + * it marks sectors as written and extends the commitlist.
> + */
> +static void bl_write_cleanup(struct work_struct *work)
> +{
> +	struct rpc_task *task;
> +	struct nfs_write_data *wdata;
> +	dprintk("%s enter\n", __func__);
> +	task = container_of(work, struct rpc_task, u.tk_work);
> +	wdata = container_of(task, struct nfs_write_data, task);
> +	pnfs_ld_write_done(wdata);
> +}
> +
> +/* Called when last of bios associated with a bl_write_pagelist call finishes */
> +static void
> +bl_end_par_io_write(void *data)
> +{
> +	struct nfs_write_data *wdata = data;
> +
> +	/* STUB - ignoring error handling */
> +	wdata->task.tk_status = 0;
> +	wdata->verf.committed = NFS_FILE_SYNC;
> +	INIT_WORK(&wdata->task.u.tk_work, bl_write_cleanup);
> +	schedule_work(&wdata->task.u.tk_work);
> +}
> +
>  static enum pnfs_try_status
> -bl_write_pagelist(struct nfs_write_data *wdata,
> -		  int sync)
> +bl_write_pagelist(struct nfs_write_data *wdata, int sync)
>  {
> -	return PNFS_NOT_ATTEMPTED;
> +	int i;
> +	struct bio *bio = NULL;
> +	struct pnfs_block_extent *be = NULL;
> +	sector_t isect, extent_length = 0;
> +	struct parallel_io *par;
> +	loff_t offset = wdata->args.offset;
> +	size_t count = wdata->args.count;
> +	struct page **pages = wdata->args.pages;
> +	int pg_index = wdata->args.pgbase >> PAGE_CACHE_SHIFT;
> +
> +	dprintk("%s enter, %Zu@%lld\n", __func__, count, offset);
> +	if (!wdata->lseg) {
> +		dprintk("%s no lseg, falling back to MDS\n", __func__);
> +		return PNFS_NOT_ATTEMPTED;
> +	}

Should never happen in the new code.

> +	if (dont_like_caller(wdata->req)) {
> +		dprintk("%s dont_like_caller failed\n", __func__);
> +		return PNFS_NOT_ATTEMPTED;
> +	}

Should never happen...



-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 22/25] pnfsblock: note written INVAL areas for layoutcommit
  2011-07-27 18:40 ` [PATCH v3 22/25] pnfsblock: note written INVAL areas for layoutcommit Jim Rees
@ 2011-07-27 20:13   ` Trond Myklebust
  2011-07-28  2:30     ` tao.peng
  0 siblings, 1 reply; 55+ messages in thread
From: Trond Myklebust @ 2011-07-27 20:13 UTC (permalink / raw)
  To: Jim Rees; +Cc: linux-nfs, peter honeyman

On Wed, 2011-07-27 at 14:40 -0400, Jim Rees wrote: 
> From: Fred Isaman <iisaman@citi.umich.edu>
> 
> [SQUASHME: pnfs: blocklayout: port block layout code]
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ?




> +int mark_for_commit(struct pnfs_block_extent *be,
> +		    sector_t offset, sector_t length);

This is an externally visible function, so it needs a prefix.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 23/25] pnfsblock: use pageio_ops api
  2011-07-27 18:40 ` [PATCH v3 23/25] pnfsblock: use pageio_ops api Jim Rees
@ 2011-07-27 20:15   ` Trond Myklebust
  2011-07-28  2:27     ` tao.peng
  0 siblings, 1 reply; 55+ messages in thread
From: Trond Myklebust @ 2011-07-27 20:15 UTC (permalink / raw)
  To: Jim Rees; +Cc: linux-nfs, peter honeyman

On Wed, 2011-07-27 at 14:40 -0400, Jim Rees wrote: 
> From: Benny Halevy <bhalevy@panasas.com>
> 
> [pnfsblock: use pnfs_generic_pg_init_read/write]
> Signed-off-by: Peng Tao <peng_tao@emc.com>
> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> ---
>  fs/nfs/blocklayout/blocklayout.c |   14 ++++++++++++++
>  1 files changed, 14 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> index 22de05c..502d384 100644
> --- a/fs/nfs/blocklayout/blocklayout.c
> +++ b/fs/nfs/blocklayout/blocklayout.c
> @@ -762,6 +762,18 @@ bl_clear_layoutdriver(struct nfs_server *server)
>  	return 0;
>  }
>  
> +static const struct nfs_pageio_ops bl_pg_read_ops = {
> +	.pg_init = pnfs_generic_pg_init_read,
> +	.pg_test = pnfs_generic_pg_test,
> +	.pg_doio = pnfs_generic_pg_readpages,
> +};
> +
> +static const struct nfs_pageio_ops bl_pg_write_ops = {
> +	.pg_init = pnfs_generic_pg_init_write,
> +	.pg_test = pnfs_generic_pg_test,
> +	.pg_doio = pnfs_generic_pg_writepages,
> +};
> +
>  static struct pnfs_layoutdriver_type blocklayout_type = {
>  	.id				= LAYOUT_BLOCK_VOLUME,
>  	.name				= "LAYOUT_BLOCK_VOLUME",
> @@ -775,6 +787,8 @@ static struct pnfs_layoutdriver_type blocklayout_type = {
>  	.cleanup_layoutcommit		= bl_cleanup_layoutcommit,
>  	.set_layoutdriver		= bl_set_layoutdriver,
>  	.clear_layoutdriver		= bl_clear_layoutdriver,
> +	.pg_read_ops			= &bl_pg_read_ops,
> +	.pg_write_ops			= &bl_pg_write_ops,
>  };
>  
How did this work previously?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 25/25] NFS41: Drop lseg ref before fallthru to MDS
  2011-07-27 18:40 ` [PATCH v3 25/25] NFS41: Drop lseg ref before fallthru to MDS Jim Rees
@ 2011-07-27 20:16   ` Trond Myklebust
  0 siblings, 0 replies; 55+ messages in thread
From: Trond Myklebust @ 2011-07-27 20:16 UTC (permalink / raw)
  To: Jim Rees; +Cc: linux-nfs, peter honeyman

On Wed, 2011-07-27 at 14:40 -0400, Jim Rees wrote: 
> From: Peng Tao <bergwolf@gmail.com>
> 
> There is no need to keep lseg reference when read/write through MDS.
> This fixes a null pointer crash at nfs_post_op_update_inode_force_wcc
> because nfs4_proc_write_setup will unset wdata->res.fattr if wdata->lseg
> is not NULL.
> 
> Signed-off-by: Peng Tao <peng_tao@emc.com>
> ---
>  fs/nfs/pnfs.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 3b20753..fda3019 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -1182,6 +1182,9 @@ pnfs_ld_write_done(struct nfs_write_data *data)
>  
>  	dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__,
>  		data->pnfs_error);
> +
> +	put_lseg(data->lseg);
> +	data->lseg = NULL;
>  	status = nfs_initiate_write(data, NFS_CLIENT(data->inode),
>  				    data->mds_ops, NFS_FILE_SYNC);

As I've said before: the real bug here is calling
nfs_initiate_write()...

> 	return status ? : -EAGAIN;
> @@ -1282,6 +1285,9 @@ pnfs_ld_read_done(struct nfs_read_data *data)
>  
>  	dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__,
>  		data->pnfs_error);
> +
> +	put_lseg(data->lseg);
> +	data->lseg = NULL;
>  	status = nfs_initiate_read(data, NFS_CLIENT(data->inode),
>  				   data->mds_ops);
>  	return status ? : -EAGAIN;

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 06/25] pnfs: cleanup_layoutcommit
  2011-07-27 19:53   ` Trond Myklebust
@ 2011-07-27 20:20     ` Boaz Harrosh
  2011-07-27 20:25       ` Trond Myklebust
  0 siblings, 1 reply; 55+ messages in thread
From: Boaz Harrosh @ 2011-07-27 20:20 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jim Rees, linux-nfs, peter honeyman

On 07/27/2011 12:53 PM, Trond Myklebust wrote:
>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>> index bddd8b9..f271425 100644
>> --- a/fs/nfs/pnfs.h
>> +++ b/fs/nfs/pnfs.h
>> @@ -113,6 +113,9 @@ struct pnfs_layoutdriver_type {
>>  				     struct xdr_stream *xdr,
>>  				     const struct nfs4_layoutreturn_args *args);
>>  
>> +	void (*cleanup_layoutcommit) (struct pnfs_layout_hdr *layoutid,
>> +				      struct nfs4_layoutcommit_data *data);
>> +
>>  	void (*encode_layoutcommit) (struct pnfs_layout_hdr *layoutid,
>>  				     struct xdr_stream *xdr,
>>  				     const struct nfs4_layoutcommit_args *args);
> 
> This really needs to go. We should have
> 
>       int (*layoutcommit)()...
> 
> instead of 'encode' and 'cleanup' methods...
> 

Theoretically it is not possible because the blocks-layout protocol mandates
different handling depending on the "error" response from the Server which
will be received on RPC done.

Though, Last I was diving into this code, The proper handling was missing.
(It was a while back though things might have changed)

Boaz


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 03/25] NFS41: Let layoutcommit handle multiple segments
  2011-07-27 20:05   ` Boaz Harrosh
@ 2011-07-27 20:22     ` Trond Myklebust
  2011-07-27 20:33       ` Boaz Harrosh
  0 siblings, 1 reply; 55+ messages in thread
From: Trond Myklebust @ 2011-07-27 20:22 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Jim Rees, linux-nfs, peter honeyman

On Wed, 2011-07-27 at 13:05 -0700, Boaz Harrosh wrote: 
> On 07/27/2011 11:40 AM, Jim Rees wrote:
> > From: Peng Tao <bergwolf@gmail.com>
> > 
> > Some layout drivers like block will have multiple segments.  Generic code
> > should be able to handle it.  Layoutcommit takes a list of segments and last
> > write offset is saved at inode level.
> > 
> > Signed-off-by: Peng Tao <peng_tao@emc.com>
> 
> Trond Hi
> 
> Could you please add this patch to the current Merge window? And also include
> 
> CC: Stable <stable@vger.kernel.org>

Nope.

Firstly it exacerbates an existing bug in the current code (see my
comments).

Secondly, it does several things at once (adding multiple segment
handling, fixing last-byte handling in layoutcommit and credential
handling). Which of these features are actually needed for stable?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 06/25] pnfs: cleanup_layoutcommit
  2011-07-27 20:20     ` Boaz Harrosh
@ 2011-07-27 20:25       ` Trond Myklebust
  2011-07-27 20:42         ` Boaz Harrosh
  0 siblings, 1 reply; 55+ messages in thread
From: Trond Myklebust @ 2011-07-27 20:25 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Jim Rees, linux-nfs, peter honeyman

On Wed, 2011-07-27 at 13:20 -0700, Boaz Harrosh wrote: 
> On 07/27/2011 12:53 PM, Trond Myklebust wrote:
> >> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> >> index bddd8b9..f271425 100644
> >> --- a/fs/nfs/pnfs.h
> >> +++ b/fs/nfs/pnfs.h
> >> @@ -113,6 +113,9 @@ struct pnfs_layoutdriver_type {
> >>  				     struct xdr_stream *xdr,
> >>  				     const struct nfs4_layoutreturn_args *args);
> >>  
> >> +	void (*cleanup_layoutcommit) (struct pnfs_layout_hdr *layoutid,
> >> +				      struct nfs4_layoutcommit_data *data);
> >> +
> >>  	void (*encode_layoutcommit) (struct pnfs_layout_hdr *layoutid,
> >>  				     struct xdr_stream *xdr,
> >>  				     const struct nfs4_layoutcommit_args *args);
> > 
> > This really needs to go. We should have
> > 
> >       int (*layoutcommit)()...
> > 
> > instead of 'encode' and 'cleanup' methods...
> > 
> 
> Theoretically it is not possible because the blocks-layout protocol mandates
> different handling depending on the "error" response from the Server which
> will be received on RPC done.

???? If the blocks code is in charge of actually doing the RPC call, why
would it not be able to perform its own error handling?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 03/25] NFS41: Let layoutcommit handle multiple segments
  2011-07-27 20:22     ` Trond Myklebust
@ 2011-07-27 20:33       ` Boaz Harrosh
  2011-07-27 20:58         ` Jim Rees
  0 siblings, 1 reply; 55+ messages in thread
From: Boaz Harrosh @ 2011-07-27 20:33 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jim Rees, linux-nfs, peter honeyman

On 07/27/2011 01:22 PM, Trond Myklebust wrote:
> On Wed, 2011-07-27 at 13:05 -0700, Boaz Harrosh wrote: 
>> On 07/27/2011 11:40 AM, Jim Rees wrote:
>>> From: Peng Tao <bergwolf@gmail.com>
>>>
>>> Some layout drivers like block will have multiple segments.  Generic code
>>> should be able to handle it.  Layoutcommit takes a list of segments and last
>>> write offset is saved at inode level.
>>>
>>> Signed-off-by: Peng Tao <peng_tao@emc.com>
>>
>> Trond Hi
>>
>> Could you please add this patch to the current Merge window? And also include
>>
>> CC: Stable <stable@vger.kernel.org>
> 
> Nope.
> 
> Firstly it exacerbates an existing bug in the current code (see my
> comments).
> 
> Secondly, it does several things at once (adding multiple segment
> handling, fixing last-byte handling in layoutcommit and credential
> handling). Which of these features are actually needed for stable?
> 

OK, I agree I thought of that too. My bakethon post had a narrower scope
to just the "adding multiple segment handling" & "last-byte handling"
(I think let me check)

But I thought for simplicity since this patchset was reviewed then, I've
tested with this patch and it works good as well. If you are going to
break things up, I can test with the minimal patches and report what is
need in Stable. (Is anyone working on it?)

Thanks
Boaz

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 06/25] pnfs: cleanup_layoutcommit
  2011-07-27 20:25       ` Trond Myklebust
@ 2011-07-27 20:42         ` Boaz Harrosh
  2011-07-27 20:59           ` Trond Myklebust
  0 siblings, 1 reply; 55+ messages in thread
From: Boaz Harrosh @ 2011-07-27 20:42 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jim Rees, linux-nfs, peter honeyman

On 07/27/2011 01:25 PM, Trond Myklebust wrote:
> On Wed, 2011-07-27 at 13:20 -0700, Boaz Harrosh wrote: 
>> On 07/27/2011 12:53 PM, Trond Myklebust wrote:
>>>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>>>> index bddd8b9..f271425 100644
>>>> --- a/fs/nfs/pnfs.h
>>>> +++ b/fs/nfs/pnfs.h
>>>> @@ -113,6 +113,9 @@ struct pnfs_layoutdriver_type {
>>>>  				     struct xdr_stream *xdr,
>>>>  				     const struct nfs4_layoutreturn_args *args);
>>>>  
>>>> +	void (*cleanup_layoutcommit) (struct pnfs_layout_hdr *layoutid,
>>>> +				      struct nfs4_layoutcommit_data *data);
>>>> +
>>>>  	void (*encode_layoutcommit) (struct pnfs_layout_hdr *layoutid,
>>>>  				     struct xdr_stream *xdr,
>>>>  				     const struct nfs4_layoutcommit_args *args);
>>>
>>> This really needs to go. We should have
>>>
>>>       int (*layoutcommit)()...
>>>
>>> instead of 'encode' and 'cleanup' methods...
>>>
>>
>> Theoretically it is not possible because the blocks-layout protocol mandates
>> different handling depending on the "error" response from the Server which
>> will be received on RPC done.
> 
> ???? If the blocks code is in charge of actually doing the RPC call, why
> would it not be able to perform its own error handling?
> 

Is it? I thought it was the Generic layer that was Initiating the layoutcommit
(From the pnfs_layoutcommit_inode called from nfs_write_inode)

The LD only has a chance to encode the payload on rpc-setup and here the blocks
code needs cleanup depending on the return-status of rpc-done

[I do think that setup and done might be better names to reflect the rpc states)

Thanks
Boaz

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 03/25] NFS41: Let layoutcommit handle multiple segments
  2011-07-27 20:33       ` Boaz Harrosh
@ 2011-07-27 20:58         ` Jim Rees
  2011-07-27 21:09           ` Boaz Harrosh
  0 siblings, 1 reply; 55+ messages in thread
From: Jim Rees @ 2011-07-27 20:58 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Trond Myklebust, linux-nfs, peter honeyman

Boaz Harrosh wrote:

  On 07/27/2011 01:22 PM, Trond Myklebust wrote:
  > On Wed, 2011-07-27 at 13:05 -0700, Boaz Harrosh wrote: 
  >> On 07/27/2011 11:40 AM, Jim Rees wrote:
  >>> From: Peng Tao <bergwolf@gmail.com>
  >>>
  >>> Some layout drivers like block will have multiple segments.  Generic code
  >>> should be able to handle it.  Layoutcommit takes a list of segments and last
  >>> write offset is saved at inode level.
  >>>
  >>> Signed-off-by: Peng Tao <peng_tao@emc.com>
  >>
  >> Trond Hi
  >>
  >> Could you please add this patch to the current Merge window? And also include
  >>
  >> CC: Stable <stable@vger.kernel.org>
  > 
  > Nope.
  > 
  > Firstly it exacerbates an existing bug in the current code (see my
  > comments).
  > 
  > Secondly, it does several things at once (adding multiple segment
  > handling, fixing last-byte handling in layoutcommit and credential
  > handling). Which of these features are actually needed for stable?
  > 
  
  OK, I agree I thought of that too. My bakethon post had a narrower scope
  to just the "adding multiple segment handling" & "last-byte handling"
  (I think let me check)
  
  But I thought for simplicity since this patchset was reviewed then, I've
  tested with this patch and it works good as well. If you are going to
  break things up, I can test with the minimal patches and report what is
  need in Stable. (Is anyone working on it?)

We may have over-squashed that one for block layout.  I can dig up the
original patches if necessary, but I'm guessing you probably have them.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 06/25] pnfs: cleanup_layoutcommit
  2011-07-27 20:42         ` Boaz Harrosh
@ 2011-07-27 20:59           ` Trond Myklebust
  2011-07-27 21:33             ` Boaz Harrosh
  0 siblings, 1 reply; 55+ messages in thread
From: Trond Myklebust @ 2011-07-27 20:59 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Jim Rees, linux-nfs, peter honeyman

On Wed, 2011-07-27 at 13:42 -0700, Boaz Harrosh wrote: 
> On 07/27/2011 01:25 PM, Trond Myklebust wrote:
> > On Wed, 2011-07-27 at 13:20 -0700, Boaz Harrosh wrote: 
> >> On 07/27/2011 12:53 PM, Trond Myklebust wrote:
> >>>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> >>>> index bddd8b9..f271425 100644
> >>>> --- a/fs/nfs/pnfs.h
> >>>> +++ b/fs/nfs/pnfs.h
> >>>> @@ -113,6 +113,9 @@ struct pnfs_layoutdriver_type {
> >>>>  				     struct xdr_stream *xdr,
> >>>>  				     const struct nfs4_layoutreturn_args *args);
> >>>>  
> >>>> +	void (*cleanup_layoutcommit) (struct pnfs_layout_hdr *layoutid,
> >>>> +				      struct nfs4_layoutcommit_data *data);
> >>>> +
> >>>>  	void (*encode_layoutcommit) (struct pnfs_layout_hdr *layoutid,
> >>>>  				     struct xdr_stream *xdr,
> >>>>  				     const struct nfs4_layoutcommit_args *args);
> >>>
> >>> This really needs to go. We should have
> >>>
> >>>       int (*layoutcommit)()...
> >>>
> >>> instead of 'encode' and 'cleanup' methods...
> >>>
> >>
> >> Theoretically it is not possible because the blocks-layout protocol mandates
> >> different handling depending on the "error" response from the Server which
> >> will be received on RPC done.
> > 
> > ???? If the blocks code is in charge of actually doing the RPC call, why
> > would it not be able to perform its own error handling?
> > 
> 
> Is it? I thought it was the Generic layer that was Initiating the layoutcommit
> (From the pnfs_layoutcommit_inode called from nfs_write_inode)
> 
> The LD only has a chance to encode the payload on rpc-setup and here the blocks
> code needs cleanup depending on the return-status of rpc-done
> 
> [I do think that setup and done might be better names to reflect the rpc states)

I'm suggesting replacing the version in the generic layer with
per-layout-type variants. When the only thing that is common between the
3 variants is a couple of lines of xdr, then it doesn't make sense IMO
to try to share.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 03/25] NFS41: Let layoutcommit handle multiple segments
  2011-07-27 20:58         ` Jim Rees
@ 2011-07-27 21:09           ` Boaz Harrosh
  2011-07-27 23:40             ` Jim Rees
  0 siblings, 1 reply; 55+ messages in thread
From: Boaz Harrosh @ 2011-07-27 21:09 UTC (permalink / raw)
  To: Jim Rees; +Cc: Trond Myklebust, linux-nfs, peter honeyman

On 07/27/2011 01:58 PM, Jim Rees wrote:
>   But I thought for simplicity since this patchset was reviewed then, I've
>   tested with this patch and it works good as well. If you are going to
>   break things up, I can test with the minimal patches and report what is
>   need in Stable. (Is anyone working on it?)
> 
> We may have over-squashed that one for block layout.  I can dig up the
> original patches if necessary, but I'm guessing you probably have them.

Please do. Because by now I'm confused. the first I discovered it was
by git diff of a branch that worked. I lost tracked of where the code
was originated from.

But Trond just for simplicity sake can't we just submit the patch as is.
It is all related to layoutcommit breakage and it is all bits that we
eventually need. both objects and blocks (and files when they do segments).
I don't think it is that "too bloated" that it can't be sent to stable.
What are the risks you are anticipating?

Thanks
Boaz

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 06/25] pnfs: cleanup_layoutcommit
  2011-07-27 20:59           ` Trond Myklebust
@ 2011-07-27 21:33             ` Boaz Harrosh
  0 siblings, 0 replies; 55+ messages in thread
From: Boaz Harrosh @ 2011-07-27 21:33 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jim Rees, linux-nfs, peter honeyman

On 07/27/2011 01:59 PM, Trond Myklebust wrote:
> On Wed, 2011-07-27 at 13:42 -0700, Boaz Harrosh wrote: 
>> On 07/27/2011 01:25 PM, Trond Myklebust wrote:
>>> On Wed, 2011-07-27 at 13:20 -0700, Boaz Harrosh wrote: 
>>>> On 07/27/2011 12:53 PM, Trond Myklebust wrote:
>>>>>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>>>>>> index bddd8b9..f271425 100644
>>>>>> --- a/fs/nfs/pnfs.h
>>>>>> +++ b/fs/nfs/pnfs.h
>>>>>> @@ -113,6 +113,9 @@ struct pnfs_layoutdriver_type {
>>>>>>  				     struct xdr_stream *xdr,
>>>>>>  				     const struct nfs4_layoutreturn_args *args);
>>>>>>  
>>>>>> +	void (*cleanup_layoutcommit) (struct pnfs_layout_hdr *layoutid,
>>>>>> +				      struct nfs4_layoutcommit_data *data);
>>>>>> +
>>>>>>  	void (*encode_layoutcommit) (struct pnfs_layout_hdr *layoutid,
>>>>>>  				     struct xdr_stream *xdr,
>>>>>>  				     const struct nfs4_layoutcommit_args *args);
>>>>>
>>>>> This really needs to go. We should have
>>>>>
>>>>>       int (*layoutcommit)()...
>>>>>
>>>>> instead of 'encode' and 'cleanup' methods...
>>>>>
>>>>
>>>> Theoretically it is not possible because the blocks-layout protocol mandates
>>>> different handling depending on the "error" response from the Server which
>>>> will be received on RPC done.
>>>
>>> ???? If the blocks code is in charge of actually doing the RPC call, why
>>> would it not be able to perform its own error handling?
>>>
>>
>> Is it? I thought it was the Generic layer that was Initiating the layoutcommit
>> (From the pnfs_layoutcommit_inode called from nfs_write_inode)
>>
>> The LD only has a chance to encode the payload on rpc-setup and here the blocks
>> code needs cleanup depending on the return-status of rpc-done
>>
>> [I do think that setup and done might be better names to reflect the rpc states)
> 
> I'm suggesting replacing the version in the generic layer with
> per-layout-type variants. When the only thing that is common between the
> 3 variants is a couple of lines of xdr, then it doesn't make sense IMO
> to try to share.
> 

You lost me. What are you suggesting to replace? pnfs_layoutcommit_inode?
nfs4_proc_layoutcommit? nfs4_xdr_enc_layoutcommit? At what level do you want to
switch to a per LD handling.

If I look at the all layoutcommit stack the actual xdr encoding is the least of
the common code. The housekeeping is the most of it. I do not see a clear point
in current code that we can make a clean cut. And that gives us a place that:
1. Already have an xdr buffer to encode into.
2. Also sees the return code from layoutcommit_done

We used to allocated the layoutcommit buffer twice and that didn't solve that
problem either. It is two stages of the rpc-state. I don't see how it can be
briged

Boaz

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 03/25] NFS41: Let layoutcommit handle multiple segments
  2011-07-27 21:09           ` Boaz Harrosh
@ 2011-07-27 23:40             ` Jim Rees
  2011-07-28  0:12               ` Boaz Harrosh
  0 siblings, 1 reply; 55+ messages in thread
From: Jim Rees @ 2011-07-27 23:40 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Trond Myklebust, linux-nfs, peter honeyman

Boaz Harrosh wrote:

  On 07/27/2011 01:58 PM, Jim Rees wrote:
  >   But I thought for simplicity since this patchset was reviewed then, I've
  >   tested with this patch and it works good as well. If you are going to
  >   break things up, I can test with the minimal patches and report what is
  >   need in Stable. (Is anyone working on it?)
  > 
  > We may have over-squashed that one for block layout.  I can dig up the
  > original patches if necessary, but I'm guessing you probably have them.
  
  Please do. Because by now I'm confused. the first I discovered it was
  by git diff of a branch that worked. I lost tracked of where the code
  was originated from.

Here is just the original multiple segments patch.  This is from the
88-patch set I sent out before Bakeathon.  I don't know if this would still
apply separately.

>From fc3381fa66adfcee4203b5a55ca6605bcfc8dafb Mon Sep 17 00:00:00 2001
From: Peng Tao <bergwolf@gmail.com>
Date: Tue, 31 May 2011 00:42:56 -0400
Subject: [PATCH 03/89] pnfs: let layoutcommit code handle multiple segments

Some layout driver like block will have multiple segments.
Generic code should be able to handle it.
---
 fs/nfs/pnfs.c |   15 ++++++++++++---
 fs/nfs/pnfs.h |    1 +
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index e3d618b..c18cda0 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1193,10 +1193,18 @@ pnfs_try_to_read_data(struct nfs_read_data *rdata,
 static struct pnfs_layout_segment *pnfs_list_write_lseg(struct inode *inode)
 {
 	struct pnfs_layout_segment *lseg, *rv = NULL;
+	loff_t max_pos = 0;
+
+	list_for_each_entry(lseg, &NFS_I(inode)->layout->plh_segs, pls_list) {
+		if (lseg->pls_range.iomode == IOMODE_RW) {
+			if (max_pos < lseg->pls_end_pos)
+				max_pos = lseg->pls_end_pos;
+			if (test_and_clear_bit(NFS_LSEG_LAYOUTCOMMIT, &lseg->pls_flags))
+				rv = lseg;
+		}
+	}
+	rv->pls_end_pos = max_pos;
 
-	list_for_each_entry(lseg, &NFS_I(inode)->layout->plh_segs, pls_list)
-		if (lseg->pls_range.iomode == IOMODE_RW)
-			rv = lseg;
 	return rv;
 }
 
@@ -1211,6 +1219,7 @@ pnfs_set_layoutcommit(struct nfs_write_data *wdata)
 	if (!test_and_set_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags)) {
 		/* references matched in nfs4_layoutcommit_release */
 		get_lseg(wdata->lseg);
+		set_bit(NFS_LSEG_LAYOUTCOMMIT, &wdata->lseg->pls_flags);
 		wdata->lseg->pls_lc_cred =
 			get_rpccred(wdata->args.context->state->owner->so_cred);
 		mark_as_dirty = true;
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 53819d0..c6a9af2 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -36,6 +36,7 @@
 enum {
 	NFS_LSEG_VALID = 0,	/* cleared when lseg is recalled/returned */
 	NFS_LSEG_ROC,		/* roc bit received from server */
+	NFS_LSEG_LAYOUTCOMMIT,	/* layoutcommit bit set for layoutcommit */
 };
 
 struct pnfs_layout_segment {
-- 
1.7.4.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 03/25] NFS41: Let layoutcommit handle multiple segments
  2011-07-27 23:40             ` Jim Rees
@ 2011-07-28  0:12               ` Boaz Harrosh
  2011-07-28  2:02                 ` tao.peng
  2011-07-28  3:19                 ` Jim Rees
  0 siblings, 2 replies; 55+ messages in thread
From: Boaz Harrosh @ 2011-07-28  0:12 UTC (permalink / raw)
  To: Jim Rees; +Cc: Trond Myklebust, linux-nfs, peter honeyman

On 07/27/2011 04:40 PM, Jim Rees wrote:
> Boaz Harrosh wrote:
> 
>   On 07/27/2011 01:58 PM, Jim Rees wrote:
>   >   But I thought for simplicity since this patchset was reviewed then, I've
>   >   tested with this patch and it works good as well. If you are going to
>   >   break things up, I can test with the minimal patches and report what is
>   >   need in Stable. (Is anyone working on it?)
>   > 
>   > We may have over-squashed that one for block layout.  I can dig up the
>   > original patches if necessary, but I'm guessing you probably have them.
>   
>   Please do. Because by now I'm confused. the first I discovered it was
>   by git diff of a branch that worked. I lost tracked of where the code
>   was originated from.
> 
> Here is just the original multiple segments patch.  This is from the
> 88-patch set I sent out before Bakeathon.  I don't know if this would still
> apply separately.
> 

OK Thanks that one looks familiar enough. I'll test just with this one.

Where did the other changes come from? Is there a Benny's branch that
have them separated so I can have a look. I suspect I also need the one
that encodes the last_byte_written.

Thanks
Boaz

> From fc3381fa66adfcee4203b5a55ca6605bcfc8dafb Mon Sep 17 00:00:00 2001
> From: Peng Tao <bergwolf@gmail.com>
> Date: Tue, 31 May 2011 00:42:56 -0400
> Subject: [PATCH 03/89] pnfs: let layoutcommit code handle multiple segments
> 
> Some layout driver like block will have multiple segments.
> Generic code should be able to handle it.
> ---
>  fs/nfs/pnfs.c |   15 ++++++++++++---
>  fs/nfs/pnfs.h |    1 +
>  2 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index e3d618b..c18cda0 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -1193,10 +1193,18 @@ pnfs_try_to_read_data(struct nfs_read_data *rdata,
>  static struct pnfs_layout_segment *pnfs_list_write_lseg(struct inode *inode)
>  {
>  	struct pnfs_layout_segment *lseg, *rv = NULL;
> +	loff_t max_pos = 0;
> +
> +	list_for_each_entry(lseg, &NFS_I(inode)->layout->plh_segs, pls_list) {
> +		if (lseg->pls_range.iomode == IOMODE_RW) {
> +			if (max_pos < lseg->pls_end_pos)
> +				max_pos = lseg->pls_end_pos;
> +			if (test_and_clear_bit(NFS_LSEG_LAYOUTCOMMIT, &lseg->pls_flags))
> +				rv = lseg;
> +		}
> +	}
> +	rv->pls_end_pos = max_pos;
>  
> -	list_for_each_entry(lseg, &NFS_I(inode)->layout->plh_segs, pls_list)
> -		if (lseg->pls_range.iomode == IOMODE_RW)
> -			rv = lseg;
>  	return rv;
>  }
>  
> @@ -1211,6 +1219,7 @@ pnfs_set_layoutcommit(struct nfs_write_data *wdata)
>  	if (!test_and_set_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags)) {
>  		/* references matched in nfs4_layoutcommit_release */
>  		get_lseg(wdata->lseg);
> +		set_bit(NFS_LSEG_LAYOUTCOMMIT, &wdata->lseg->pls_flags);
>  		wdata->lseg->pls_lc_cred =
>  			get_rpccred(wdata->args.context->state->owner->so_cred);
>  		mark_as_dirty = true;
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index 53819d0..c6a9af2 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -36,6 +36,7 @@
>  enum {
>  	NFS_LSEG_VALID = 0,	/* cleared when lseg is recalled/returned */
>  	NFS_LSEG_ROC,		/* roc bit received from server */
> +	NFS_LSEG_LAYOUTCOMMIT,	/* layoutcommit bit set for layoutcommit */
>  };
>  
>  struct pnfs_layout_segment {


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v3 03/25] NFS41: Let layoutcommit handle multiple segments
  2011-07-28  0:12               ` Boaz Harrosh
@ 2011-07-28  2:02                 ` tao.peng
  2011-07-28  3:19                 ` Jim Rees
  1 sibling, 0 replies; 55+ messages in thread
From: tao.peng @ 2011-07-28  2:02 UTC (permalink / raw)
  To: bharrosh, rees; +Cc: Trond.Myklebust, linux-nfs, honey

SGksIEJvYXosDQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogbGludXgt
bmZzLW93bmVyQHZnZXIua2VybmVsLm9yZyBbbWFpbHRvOmxpbnV4LW5mcy1vd25lckB2Z2VyLmtl
cm5lbC5vcmddDQo+IE9uIEJlaGFsZiBPZiBCb2F6IEhhcnJvc2gNCj4gU2VudDogVGh1cnNkYXks
IEp1bHkgMjgsIDIwMTEgODoxMiBBTQ0KPiBUbzogSmltIFJlZXMNCj4gQ2M6IFRyb25kIE15a2xl
YnVzdDsgbGludXgtbmZzQHZnZXIua2VybmVsLm9yZzsgcGV0ZXIgaG9uZXltYW4NCj4gU3ViamVj
dDogUmU6IFtQQVRDSCB2MyAwMy8yNV0gTkZTNDE6IExldCBsYXlvdXRjb21taXQgaGFuZGxlIG11
bHRpcGxlIHNlZ21lbnRzDQo+IA0KPiBPbiAwNy8yNy8yMDExIDA0OjQwIFBNLCBKaW0gUmVlcyB3
cm90ZToNCj4gPiBCb2F6IEhhcnJvc2ggd3JvdGU6DQo+ID4NCj4gPiAgIE9uIDA3LzI3LzIwMTEg
MDE6NTggUE0sIEppbSBSZWVzIHdyb3RlOg0KPiA+ICAgPiAgIEJ1dCBJIHRob3VnaHQgZm9yIHNp
bXBsaWNpdHkgc2luY2UgdGhpcyBwYXRjaHNldCB3YXMgcmV2aWV3ZWQgdGhlbiwgSSd2ZQ0KPiA+
ICAgPiAgIHRlc3RlZCB3aXRoIHRoaXMgcGF0Y2ggYW5kIGl0IHdvcmtzIGdvb2QgYXMgd2VsbC4g
SWYgeW91IGFyZSBnb2luZyB0bw0KPiA+ICAgPiAgIGJyZWFrIHRoaW5ncyB1cCwgSSBjYW4gdGVz
dCB3aXRoIHRoZSBtaW5pbWFsIHBhdGNoZXMgYW5kIHJlcG9ydCB3aGF0IGlzDQo+ID4gICA+ICAg
bmVlZCBpbiBTdGFibGUuIChJcyBhbnlvbmUgd29ya2luZyBvbiBpdD8pDQo+ID4gICA+DQo+ID4g
ICA+IFdlIG1heSBoYXZlIG92ZXItc3F1YXNoZWQgdGhhdCBvbmUgZm9yIGJsb2NrIGxheW91dC4g
IEkgY2FuIGRpZyB1cCB0aGUNCj4gPiAgID4gb3JpZ2luYWwgcGF0Y2hlcyBpZiBuZWNlc3Nhcnks
IGJ1dCBJJ20gZ3Vlc3NpbmcgeW91IHByb2JhYmx5IGhhdmUgdGhlbS4NCj4gPg0KPiA+ICAgUGxl
YXNlIGRvLiBCZWNhdXNlIGJ5IG5vdyBJJ20gY29uZnVzZWQuIHRoZSBmaXJzdCBJIGRpc2NvdmVy
ZWQgaXQgd2FzDQo+ID4gICBieSBnaXQgZGlmZiBvZiBhIGJyYW5jaCB0aGF0IHdvcmtlZC4gSSBs
b3N0IHRyYWNrZWQgb2Ygd2hlcmUgdGhlIGNvZGUNCj4gPiAgIHdhcyBvcmlnaW5hdGVkIGZyb20u
DQo+ID4NCj4gPiBIZXJlIGlzIGp1c3QgdGhlIG9yaWdpbmFsIG11bHRpcGxlIHNlZ21lbnRzIHBh
dGNoLiAgVGhpcyBpcyBmcm9tIHRoZQ0KPiA+IDg4LXBhdGNoIHNldCBJIHNlbnQgb3V0IGJlZm9y
ZSBCYWtlYXRob24uICBJIGRvbid0IGtub3cgaWYgdGhpcyB3b3VsZCBzdGlsbA0KPiA+IGFwcGx5
IHNlcGFyYXRlbHkuDQo+ID4NCj4gDQo+IE9LIFRoYW5rcyB0aGF0IG9uZSBsb29rcyBmYW1pbGlh
ciBlbm91Z2guIEknbGwgdGVzdCBqdXN0IHdpdGggdGhpcyBvbmUuDQpBcyBGcmVkIHBvaW50ZWQg
b3V0IGVhcmxpZXIsIHRoaXMgb25lIGhhcyBzb21lIHByb2JsZW0gb2YgcmV0dXJuaW5nIGEgcmFu
ZG9tIGxzZWcuIEkgd2lsbCBzcGxpdCB0aGUgZmluYWwgdmVyc2lvbiBhbmQgeW91IGNhbiB0ZXN0
IHdoaWNoIHBhdGNoIHlvdSBuZWVkIGZvciBzdGFibGUuDQoNCkNoZWVycywNClRhbw0KDQo+IA0K
PiBXaGVyZSBkaWQgdGhlIG90aGVyIGNoYW5nZXMgY29tZSBmcm9tPyBJcyB0aGVyZSBhIEJlbm55
J3MgYnJhbmNoIHRoYXQNCj4gaGF2ZSB0aGVtIHNlcGFyYXRlZCBzbyBJIGNhbiBoYXZlIGEgbG9v
ay4gSSBzdXNwZWN0IEkgYWxzbyBuZWVkIHRoZSBvbmUNCj4gdGhhdCBlbmNvZGVzIHRoZSBsYXN0
X2J5dGVfd3JpdHRlbi4NCj4gDQo+IFRoYW5rcw0KPiBCb2F6DQo+IA0KPiA+IEZyb20gZmMzMzgx
ZmE2NmFkZmNlZTQyMDNiNWE1NWNhNjYwNWJjZmM4ZGFmYiBNb24gU2VwIDE3IDAwOjAwOjAwIDIw
MDENCj4gPiBGcm9tOiBQZW5nIFRhbyA8YmVyZ3dvbGZAZ21haWwuY29tPg0KPiA+IERhdGU6IFR1
ZSwgMzEgTWF5IDIwMTEgMDA6NDI6NTYgLTA0MDANCj4gPiBTdWJqZWN0OiBbUEFUQ0ggMDMvODld
IHBuZnM6IGxldCBsYXlvdXRjb21taXQgY29kZSBoYW5kbGUgbXVsdGlwbGUgc2VnbWVudHMNCj4g
Pg0KPiA+IFNvbWUgbGF5b3V0IGRyaXZlciBsaWtlIGJsb2NrIHdpbGwgaGF2ZSBtdWx0aXBsZSBz
ZWdtZW50cy4NCj4gPiBHZW5lcmljIGNvZGUgc2hvdWxkIGJlIGFibGUgdG8gaGFuZGxlIGl0Lg0K
PiA+IC0tLQ0KPiA+ICBmcy9uZnMvcG5mcy5jIHwgICAxNSArKysrKysrKysrKystLS0NCj4gPiAg
ZnMvbmZzL3BuZnMuaCB8ICAgIDEgKw0KPiA+ICAyIGZpbGVzIGNoYW5nZWQsIDEzIGluc2VydGlv
bnMoKyksIDMgZGVsZXRpb25zKC0pDQo+ID4NCj4gPiBkaWZmIC0tZ2l0IGEvZnMvbmZzL3BuZnMu
YyBiL2ZzL25mcy9wbmZzLmMNCj4gPiBpbmRleCBlM2Q2MThiLi5jMThjZGEwIDEwMDY0NA0KPiA+
IC0tLSBhL2ZzL25mcy9wbmZzLmMNCj4gPiArKysgYi9mcy9uZnMvcG5mcy5jDQo+ID4gQEAgLTEx
OTMsMTAgKzExOTMsMTggQEAgcG5mc190cnlfdG9fcmVhZF9kYXRhKHN0cnVjdCBuZnNfcmVhZF9k
YXRhICpyZGF0YSwNCj4gPiAgc3RhdGljIHN0cnVjdCBwbmZzX2xheW91dF9zZWdtZW50ICpwbmZz
X2xpc3Rfd3JpdGVfbHNlZyhzdHJ1Y3QgaW5vZGUgKmlub2RlKQ0KPiA+ICB7DQo+ID4gIAlzdHJ1
Y3QgcG5mc19sYXlvdXRfc2VnbWVudCAqbHNlZywgKnJ2ID0gTlVMTDsNCj4gPiArCWxvZmZfdCBt
YXhfcG9zID0gMDsNCj4gPiArDQo+ID4gKwlsaXN0X2Zvcl9lYWNoX2VudHJ5KGxzZWcsICZORlNf
SShpbm9kZSktPmxheW91dC0+cGxoX3NlZ3MsIHBsc19saXN0KSB7DQo+ID4gKwkJaWYgKGxzZWct
PnBsc19yYW5nZS5pb21vZGUgPT0gSU9NT0RFX1JXKSB7DQo+ID4gKwkJCWlmIChtYXhfcG9zIDwg
bHNlZy0+cGxzX2VuZF9wb3MpDQo+ID4gKwkJCQltYXhfcG9zID0gbHNlZy0+cGxzX2VuZF9wb3M7
DQo+ID4gKwkJCWlmICh0ZXN0X2FuZF9jbGVhcl9iaXQoTkZTX0xTRUdfTEFZT1VUQ09NTUlULCAm
bHNlZy0+cGxzX2ZsYWdzKSkNCj4gPiArCQkJCXJ2ID0gbHNlZzsNCj4gPiArCQl9DQo+ID4gKwl9
DQo+ID4gKwlydi0+cGxzX2VuZF9wb3MgPSBtYXhfcG9zOw0KPiA+DQo+ID4gLQlsaXN0X2Zvcl9l
YWNoX2VudHJ5KGxzZWcsICZORlNfSShpbm9kZSktPmxheW91dC0+cGxoX3NlZ3MsIHBsc19saXN0
KQ0KPiA+IC0JCWlmIChsc2VnLT5wbHNfcmFuZ2UuaW9tb2RlID09IElPTU9ERV9SVykNCj4gPiAt
CQkJcnYgPSBsc2VnOw0KPiA+ICAJcmV0dXJuIHJ2Ow0KPiA+ICB9DQo+ID4NCj4gPiBAQCAtMTIx
MSw2ICsxMjE5LDcgQEAgcG5mc19zZXRfbGF5b3V0Y29tbWl0KHN0cnVjdCBuZnNfd3JpdGVfZGF0
YSAqd2RhdGEpDQo+ID4gIAlpZiAoIXRlc3RfYW5kX3NldF9iaXQoTkZTX0lOT19MQVlPVVRDT01N
SVQsICZuZnNpLT5mbGFncykpIHsNCj4gPiAgCQkvKiByZWZlcmVuY2VzIG1hdGNoZWQgaW4gbmZz
NF9sYXlvdXRjb21taXRfcmVsZWFzZSAqLw0KPiA+ICAJCWdldF9sc2VnKHdkYXRhLT5sc2VnKTsN
Cj4gPiArCQlzZXRfYml0KE5GU19MU0VHX0xBWU9VVENPTU1JVCwgJndkYXRhLT5sc2VnLT5wbHNf
ZmxhZ3MpOw0KPiA+ICAJCXdkYXRhLT5sc2VnLT5wbHNfbGNfY3JlZCA9DQo+ID4gIAkJCWdldF9y
cGNjcmVkKHdkYXRhLT5hcmdzLmNvbnRleHQtPnN0YXRlLT5vd25lci0+c29fY3JlZCk7DQo+ID4g
IAkJbWFya19hc19kaXJ0eSA9IHRydWU7DQo+ID4gZGlmZiAtLWdpdCBhL2ZzL25mcy9wbmZzLmgg
Yi9mcy9uZnMvcG5mcy5oDQo+ID4gaW5kZXggNTM4MTlkMC4uYzZhOWFmMiAxMDA2NDQNCj4gPiAt
LS0gYS9mcy9uZnMvcG5mcy5oDQo+ID4gKysrIGIvZnMvbmZzL3BuZnMuaA0KPiA+IEBAIC0zNiw2
ICszNiw3IEBADQo+ID4gIGVudW0gew0KPiA+ICAJTkZTX0xTRUdfVkFMSUQgPSAwLAkvKiBjbGVh
cmVkIHdoZW4gbHNlZyBpcyByZWNhbGxlZC9yZXR1cm5lZCAqLw0KPiA+ICAJTkZTX0xTRUdfUk9D
LAkJLyogcm9jIGJpdCByZWNlaXZlZCBmcm9tIHNlcnZlciAqLw0KPiA+ICsJTkZTX0xTRUdfTEFZ
T1VUQ09NTUlULAkvKiBsYXlvdXRjb21taXQgYml0IHNldCBmb3IgbGF5b3V0Y29tbWl0ICovDQo+
ID4gIH07DQo+ID4NCj4gPiAgc3RydWN0IHBuZnNfbGF5b3V0X3NlZ21lbnQgew0KPiANCj4gLS0N
Cj4gVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2Ny
aWJlIGxpbnV4LW5mcyIgaW4NCj4gdGhlIGJvZHkgb2YgYSBtZXNzYWdlIHRvIG1ham9yZG9tb0B2
Z2VyLmtlcm5lbC5vcmcNCj4gTW9yZSBtYWpvcmRvbW8gaW5mbyBhdCAgaHR0cDovL3ZnZXIua2Vy
bmVsLm9yZy9tYWpvcmRvbW8taW5mby5odG1sDQoNCgTvv717Lm7vv70r77+977+977+977+977+9
77+977+9KyXvv73vv71sendt77+977+9Yu+/veunsu+/ve+/vXLvv73vv716WO+/ve+/vRnfsinv
v73vv73vv713Kh9qZ++/ve+/ve+/vR7vv73vv73vv73vv73vv73domov77+977+977+9eu+/vd6W
77+977+9Mu+/vd6Z77+977+977+9Ju+/vSnfoe+/vWHvv73vv71/77+977+9Hu+/vUfvv73vv73v
v71o77+9D++/vWo6K3bvv73vv73vv71377+92aU=

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v3 23/25] pnfsblock: use pageio_ops api
  2011-07-27 20:15   ` Trond Myklebust
@ 2011-07-28  2:27     ` tao.peng
  2011-07-28  3:52       ` Trond Myklebust
  0 siblings, 1 reply; 55+ messages in thread
From: tao.peng @ 2011-07-28  2:27 UTC (permalink / raw)
  To: Trond.Myklebust, rees; +Cc: linux-nfs, honey

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBsaW51eC1uZnMtb3duZXJAdmdl
ci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9yZ10NCj4g
T24gQmVoYWxmIE9mIFRyb25kIE15a2xlYnVzdA0KPiBTZW50OiBUaHVyc2RheSwgSnVseSAyOCwg
MjAxMSA0OjE1IEFNDQo+IFRvOiBKaW0gUmVlcw0KPiBDYzogbGludXgtbmZzQHZnZXIua2VybmVs
Lm9yZzsgcGV0ZXIgaG9uZXltYW4NCj4gU3ViamVjdDogUmU6IFtQQVRDSCB2MyAyMy8yNV0gcG5m
c2Jsb2NrOiB1c2UgcGFnZWlvX29wcyBhcGkNCj4gDQo+IE9uIFdlZCwgMjAxMS0wNy0yNyBhdCAx
NDo0MCAtMDQwMCwgSmltIFJlZXMgd3JvdGU6DQo+ID4gRnJvbTogQmVubnkgSGFsZXZ5IDxiaGFs
ZXZ5QHBhbmFzYXMuY29tPg0KPiA+DQo+ID4gW3BuZnNibG9jazogdXNlIHBuZnNfZ2VuZXJpY19w
Z19pbml0X3JlYWQvd3JpdGVdDQo+ID4gU2lnbmVkLW9mZi1ieTogUGVuZyBUYW8gPHBlbmdfdGFv
QGVtYy5jb20+DQo+ID4gU2lnbmVkLW9mZi1ieTogQmVubnkgSGFsZXZ5IDxiaGFsZXZ5QHBhbmFz
YXMuY29tPg0KPiA+IC0tLQ0KPiA+ICBmcy9uZnMvYmxvY2tsYXlvdXQvYmxvY2tsYXlvdXQuYyB8
ICAgMTQgKysrKysrKysrKysrKysNCj4gPiAgMSBmaWxlcyBjaGFuZ2VkLCAxNCBpbnNlcnRpb25z
KCspLCAwIGRlbGV0aW9ucygtKQ0KPiA+DQo+ID4gZGlmZiAtLWdpdCBhL2ZzL25mcy9ibG9ja2xh
eW91dC9ibG9ja2xheW91dC5jIGIvZnMvbmZzL2Jsb2NrbGF5b3V0L2Jsb2NrbGF5b3V0LmMNCj4g
PiBpbmRleCAyMmRlMDVjLi41MDJkMzg0IDEwMDY0NA0KPiA+IC0tLSBhL2ZzL25mcy9ibG9ja2xh
eW91dC9ibG9ja2xheW91dC5jDQo+ID4gKysrIGIvZnMvbmZzL2Jsb2NrbGF5b3V0L2Jsb2NrbGF5
b3V0LmMNCj4gPiBAQCAtNzYyLDYgKzc2MiwxOCBAQCBibF9jbGVhcl9sYXlvdXRkcml2ZXIoc3Ry
dWN0IG5mc19zZXJ2ZXIgKnNlcnZlcikNCj4gPiAgCXJldHVybiAwOw0KPiA+ICB9DQo+ID4NCj4g
PiArc3RhdGljIGNvbnN0IHN0cnVjdCBuZnNfcGFnZWlvX29wcyBibF9wZ19yZWFkX29wcyA9IHsN
Cj4gPiArCS5wZ19pbml0ID0gcG5mc19nZW5lcmljX3BnX2luaXRfcmVhZCwNCj4gPiArCS5wZ190
ZXN0ID0gcG5mc19nZW5lcmljX3BnX3Rlc3QsDQo+ID4gKwkucGdfZG9pbyA9IHBuZnNfZ2VuZXJp
Y19wZ19yZWFkcGFnZXMsDQo+ID4gK307DQo+ID4gKw0KPiA+ICtzdGF0aWMgY29uc3Qgc3RydWN0
IG5mc19wYWdlaW9fb3BzIGJsX3BnX3dyaXRlX29wcyA9IHsNCj4gPiArCS5wZ19pbml0ID0gcG5m
c19nZW5lcmljX3BnX2luaXRfd3JpdGUsDQo+ID4gKwkucGdfdGVzdCA9IHBuZnNfZ2VuZXJpY19w
Z190ZXN0LA0KPiA+ICsJLnBnX2RvaW8gPSBwbmZzX2dlbmVyaWNfcGdfd3JpdGVwYWdlcywNCj4g
PiArfTsNCj4gPiArDQo+ID4gIHN0YXRpYyBzdHJ1Y3QgcG5mc19sYXlvdXRkcml2ZXJfdHlwZSBi
bG9ja2xheW91dF90eXBlID0gew0KPiA+ICAJLmlkCQkJCT0gTEFZT1VUX0JMT0NLX1ZPTFVNRSwN
Cj4gPiAgCS5uYW1lCQkJCT0gIkxBWU9VVF9CTE9DS19WT0xVTUUiLA0KPiA+IEBAIC03NzUsNiAr
Nzg3LDggQEAgc3RhdGljIHN0cnVjdCBwbmZzX2xheW91dGRyaXZlcl90eXBlIGJsb2NrbGF5b3V0
X3R5cGUgPSB7DQo+ID4gIAkuY2xlYW51cF9sYXlvdXRjb21taXQJCT0gYmxfY2xlYW51cF9sYXlv
dXRjb21taXQsDQo+ID4gIAkuc2V0X2xheW91dGRyaXZlcgkJPSBibF9zZXRfbGF5b3V0ZHJpdmVy
LA0KPiA+ICAJLmNsZWFyX2xheW91dGRyaXZlcgkJPSBibF9jbGVhcl9sYXlvdXRkcml2ZXIsDQo+
ID4gKwkucGdfcmVhZF9vcHMJCQk9ICZibF9wZ19yZWFkX29wcywNCj4gPiArCS5wZ193cml0ZV9v
cHMJCQk9ICZibF9wZ193cml0ZV9vcHMsDQo+ID4gIH07DQo+ID4NCj4gSG93IGRpZCB0aGlzIHdv
cmsgcHJldmlvdXNseT8NCmJsb2NrIGxheW91dCB1c2VkIHRvIHVzZSAucGdfdGVzdCBpbnRlcmZh
Y2UgaW4gcG5mc19sYXlvdXRkcml2ZXJfdHlwZSBidXQgaXQgZ290IHJlbW92ZWQgaW4gdGhlIDMu
MCBtZXJnZSB3aW5kb3cuDQoNClRoYW5rcywNClRhbw0KDQo+IA0KPiAtLQ0KPiBUcm9uZCBNeWts
ZWJ1c3QNCj4gTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyDQo+IA0KPiBOZXRBcHANCj4gVHJv
bmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCj4gd3d3Lm5ldGFwcC5jb20NCj4gDQo+IC0tDQo+IFRv
IHVuc3Vic2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1YnNjcmliZSBs
aW51eC1uZnMiIGluDQo+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5r
ZXJuZWwub3JnDQo+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0dHA6Ly92Z2VyLmtlcm5lbC5v
cmcvbWFqb3Jkb21vLWluZm8uaHRtbA0KDQoT77+977+97Lm7HO+/vSbvv71+77+9Ju+/vRjvv73v
v70rLe+/ve+/vd22F++/ve+/vXfvv73vv73Lm++/ve+/ve+/vW3vv71i77+977+9Z37Ip++/vRfv
v73vv73cqH3vv73vv73vv73GoHrvv70majordu+/ve+/ve+/vQfvv73vv73vv73vv716Wivvv73v
v70rembvv73vv73vv71o77+977+977+9fu+/ve+/ve+/ve+/vWnvv73vv73vv71677+9Hu+/vXfv
v73vv73vv70/77+977+977+977+9Ju+/vSnfohtm

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v3 22/25] pnfsblock: note written INVAL areas for layoutcommit
  2011-07-27 20:13   ` Trond Myklebust
@ 2011-07-28  2:30     ` tao.peng
  0 siblings, 0 replies; 55+ messages in thread
From: tao.peng @ 2011-07-28  2:30 UTC (permalink / raw)
  To: Trond.Myklebust, rees; +Cc: linux-nfs, honey

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBsaW51eC1uZnMtb3duZXJAdmdl
ci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9yZ10NCj4g
T24gQmVoYWxmIE9mIFRyb25kIE15a2xlYnVzdA0KPiBTZW50OiBUaHVyc2RheSwgSnVseSAyOCwg
MjAxMSA0OjEzIEFNDQo+IFRvOiBKaW0gUmVlcw0KPiBDYzogbGludXgtbmZzQHZnZXIua2VybmVs
Lm9yZzsgcGV0ZXIgaG9uZXltYW4NCj4gU3ViamVjdDogUmU6IFtQQVRDSCB2MyAyMi8yNV0gcG5m
c2Jsb2NrOiBub3RlIHdyaXR0ZW4gSU5WQUwgYXJlYXMgZm9yIGxheW91dGNvbW1pdA0KPiANCj4g
T24gV2VkLCAyMDExLTA3LTI3IGF0IDE0OjQwIC0wNDAwLCBKaW0gUmVlcyB3cm90ZToNCj4gPiBG
cm9tOiBGcmVkIElzYW1hbiA8aWlzYW1hbkBjaXRpLnVtaWNoLmVkdT4NCj4gPg0KPiA+IFtTUVVB
U0hNRTogcG5mczogYmxvY2tsYXlvdXQ6IHBvcnQgYmxvY2sgbGF5b3V0IGNvZGVdDQo+ICAgXl5e
Xl5eXl5eXl5eXl5eXl5eXl5eXl5eXl5eXl4gPw0KVGhpcyBpcyBmcm9tIFtQQVRDSCA4Ny84OV0g
U1FVQVNITUU6IHBuZnM6IGJsb2NrbGF5b3V0OiBwb3J0IGJsb2NrIGxheW91dCBjb2RlLCB3aGlj
aCB3YXMgYSBsYXJnZSBwYXRjaCB0aGF0IG1hZGUgbmVjZXNzYXJ5IGNoYW5nZSB0byBmaXggdXAg
YnJva2VuIGJsb2NrIGxheW91dCBjb2RlIHNpbmNlIGxhc3QgbWVyZ2Ugd2luZG93Lg0KWWVzLCBJ
IGFncmVlIHRoZSB0aXRsZSBpcyBtaXNsZWFkaW5nLiBXaWxsIHJlbW92ZSBpdCBmcm9tIHRoZSBj
b21taXQgbG9nLg0KDQpUaGFua3MsDQpUYW8NCg0KPiANCj4gDQo+IA0KPiANCj4gPiAraW50IG1h
cmtfZm9yX2NvbW1pdChzdHJ1Y3QgcG5mc19ibG9ja19leHRlbnQgKmJlLA0KPiA+ICsJCSAgICBz
ZWN0b3JfdCBvZmZzZXQsIHNlY3Rvcl90IGxlbmd0aCk7DQo+IA0KPiBUaGlzIGlzIGFuIGV4dGVy
bmFsbHkgdmlzaWJsZSBmdW5jdGlvbiwgc28gaXQgbmVlZHMgYSBwcmVmaXguDQpXaWxsIGRvIGl0
LiBUaGFua3MuDQoNCg0KPiANCj4gLS0NCj4gVHJvbmQgTXlrbGVidXN0DQo+IExpbnV4IE5GUyBj
bGllbnQgbWFpbnRhaW5lcg0KPiANCj4gTmV0QXBwDQo+IFRyb25kLk15a2xlYnVzdEBuZXRhcHAu
Y29tDQo+IHd3dy5uZXRhcHAuY29tDQo+IA0KPiAtLQ0KPiBUbyB1bnN1YnNjcmliZSBmcm9tIHRo
aXMgbGlzdDogc2VuZCB0aGUgbGluZSAidW5zdWJzY3JpYmUgbGludXgtbmZzIiBpbg0KPiB0aGUg
Ym9keSBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21vQHZnZXIua2VybmVsLm9yZw0KPiBNb3JlIG1h
am9yZG9tbyBpbmZvIGF0ICBodHRwOi8vdmdlci5rZXJuZWwub3JnL21ham9yZG9tby1pbmZvLmh0
bWwNCg0KTu+/ve+/ve+/ve+/ve+/vXLvv73vv71577+977+977+9Yu+/vVjvv73vv73Hp3bvv71e
77+9Kd66ey5u77+9K++/ve+/ve+/ve+/vXvvv73vv73vv70i77+977+9Xm7vv71y77+977+977+9
eu+/vRrvv73vv71o77+977+977+977+9Ju+/ve+/vR7vv71H77+977+977+9aO+/vQMo77+96ZqO
77+93aJqIu+/ve+/vRrvv70bbe+/ve+/ve+/ve+/ve+/vXrvv73elu+/ve+/ve+/vWbvv73vv73v
v71o77+977+977+9fu+/vW3vv70=

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 03/25] NFS41: Let layoutcommit handle multiple segments
  2011-07-28  0:12               ` Boaz Harrosh
  2011-07-28  2:02                 ` tao.peng
@ 2011-07-28  3:19                 ` Jim Rees
  1 sibling, 0 replies; 55+ messages in thread
From: Jim Rees @ 2011-07-28  3:19 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Trond Myklebust, linux-nfs, peter honeyman

Boaz Harrosh wrote:

  Where did the other changes come from? Is there a Benny's branch that
  have them separated so I can have a look. I suspect I also need the one
  that encodes the last_byte_written.

I have pushed this series to the pnfs-88 branch in my repo:
git://citi.umich.edu/projects/linux-pnfs-blk.git

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v3 04/25] NFS41: save layoutcommit cred after first successful layoutget
  2011-07-27 19:47   ` Trond Myklebust
@ 2011-07-28  3:29     ` tao.peng
  0 siblings, 0 replies; 55+ messages in thread
From: tao.peng @ 2011-07-28  3:29 UTC (permalink / raw)
  To: Trond.Myklebust, rees; +Cc: linux-nfs, honey

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBsaW51eC1uZnMtb3duZXJAdmdl
ci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9yZ10NCj4g
T24gQmVoYWxmIE9mIFRyb25kIE15a2xlYnVzdA0KPiBTZW50OiBUaHVyc2RheSwgSnVseSAyOCwg
MjAxMSAzOjQ3IEFNDQo+IFRvOiBKaW0gUmVlcw0KPiBDYzogbGludXgtbmZzQHZnZXIua2VybmVs
Lm9yZzsgcGV0ZXIgaG9uZXltYW4NCj4gU3ViamVjdDogUmU6IFtQQVRDSCB2MyAwNC8yNV0gTkZT
NDE6IHNhdmUgbGF5b3V0Y29tbWl0IGNyZWQgYWZ0ZXIgZmlyc3Qgc3VjY2Vzc2Z1bA0KPiBsYXlv
dXRnZXQNCj4gDQo+IE9uIFdlZCwgMjAxMS0wNy0yNyBhdCAxNDo0MCAtMDQwMCwgSmltIFJlZXMg
d3JvdGU6DQo+ID4gRnJvbTogUGVuZyBUYW8gPGJlcmd3b2xmQGdtYWlsLmNvbT4NCj4gPg0KPiA+
IFNhdmUgaXQgaW4gbGF5b3V0IGhlYWRlciBpbnN0ZWFkIG9mIGluIGV2ZXJ5IHNlZ21lbnQuDQo+
ID4NCj4gPiBTaWduZWQtb2ZmLWJ5OiBQZW5nIFRhbyA8cGVuZ190YW9AZW1jLmNvbT4NCj4gPiAt
LS0NCj4gPiAgZnMvbmZzL3BuZnMuYyB8ICAgMTIgKysrKysrLS0tLS0tDQo+ID4gIGZzL25mcy9w
bmZzLmggfCAgICAxIC0NCj4gPiAgMiBmaWxlcyBjaGFuZ2VkLCA2IGluc2VydGlvbnMoKyksIDcg
ZGVsZXRpb25zKC0pDQo+ID4NCj4gPiBkaWZmIC0tZ2l0IGEvZnMvbmZzL3BuZnMuYyBiL2ZzL25m
cy9wbmZzLmMNCj4gPiBpbmRleCBkYWUxOWRkLi5mYmViZDJhIDEwMDY0NA0KPiA+IC0tLSBhL2Zz
L25mcy9wbmZzLmMNCj4gPiArKysgYi9mcy9uZnMvcG5mcy5jDQo+ID4gQEAgLTIxMSw2ICsyMTEs
NyBAQCBkZXN0cm95X2xheW91dF9oZHIoc3RydWN0IHBuZnNfbGF5b3V0X2hkciAqbG8pDQo+ID4g
IAlkcHJpbnRrKCIlczogZnJlZWluZyBsYXlvdXQgY2FjaGUgJXBcbiIsIF9fZnVuY19fLCBsbyk7
DQo+ID4gIAlCVUdfT04oIWxpc3RfZW1wdHkoJmxvLT5wbGhfbGF5b3V0cykpOw0KPiA+ICAJTkZT
X0kobG8tPnBsaF9pbm9kZSktPmxheW91dCA9IE5VTEw7DQo+ID4gKwlwdXRfcnBjY3JlZChsby0+
cGxoX2xjX2NyZWQpOw0KPiA+ICAJcG5mc19mcmVlX2xheW91dF9oZHIobG8pOw0KPiA+ICB9DQo+
ID4NCj4gPiBAQCAtMTAwNyw2ICsxMDA4LDEwIEBAIHBuZnNfdXBkYXRlX2xheW91dChzdHJ1Y3Qg
aW5vZGUgKmlubywNCj4gPiAgCQlsaXN0X2RlbF9pbml0KCZsby0+cGxoX2xheW91dHMpOw0KPiA+
ICAJCXNwaW5fdW5sb2NrKCZjbHAtPmNsX2xvY2spOw0KPiA+ICAJfQ0KPiA+ICsJaWYgKGZpcnN0
KSB7DQo+ID4gKwkJbG8tPnBsaF9sY19jcmVkID0NCj4gPiArCQkJZ2V0X3JwY2NyZWQoY3R4LT5z
dGF0ZS0+b3duZXItPnNvX2NyZWQpOw0KPiA+ICsJfQ0KPiANCj4gSG1tLi4uIFRoaXMgd2hvbGUg
J2ZpcnN0JyB0aGluZyBpcyBpbmhlcmVudGx5IHJhY3kuIFdoYXQgaWYgdHdvIHRocmVhZHMNCj4g
YXJlIGNhbGxpbmcgcG5mc191cGRhdGVfbGF5b3V0IGF0IHRoZSBzYW1lIHRpbWU6IHRoZXkgYm90
aCBzZWUNCj4gbGlzdF9lbXB0eSgmbG8tPnBsaF9zZWdzKSwgYW5kIGJvdGggY29uY2x1ZGUgdGhh
dCB0aGV5IG5lZWQgdG8NCj4gaW5pdGlhbGlzZSB0aGUgbGF5b3V0IGhlYWRlci4gX0JPT01fLi4u
Lg0KPiANCj4gVGhlIGNvcnJlY3QgdGhpbmcgdG8gZG8gaGVyZSBpcyBlaXRoZXIgdG8gaW5pdGlh
bGlzZSB0aGUgbGF5b3V0IGhlYWRlcg0KPiBjb3JyZWN0bHkgaW5zaWRlIHBuZnNfZmluZF9hbGxv
Y19sYXlvdXQoKSwgb3IgdG8gYWRkIGEgZmxhZyBmb3INCj4gLT5wbGhfZmxhZ3MgdGhhdCBjYW4g
YmUgdXNlZCB3aXRoIHRlc3RfYW5kX2NsZWFyX2JpdCgpIHRvIGdldCBfb25lXyBvZg0KPiB0aGUg
cHJvY2Vzc2VzIHRvIHBlcmZvcm0gdGhlIGluaXRpYWxpc2F0aW9uLg0KPiBQZXJzb25hbGx5LCBJ
J2QgcHJlZmVyIHRoZSBmb3JtZXIuDQpXaWxsIG1vdmUgaXQgaW5zaWRlIGxheW91dCBoZWFkZXIg
aW5pdC4gVGhhbmtzLg0KDQpDaGVlcnMsDQpUYW8NCj4gDQo+IA0KPiAtLQ0KPiBUcm9uZCBNeWts
ZWJ1c3QNCj4gTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyDQo+IA0KPiBOZXRBcHANCj4gVHJv
bmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCj4gd3d3Lm5ldGFwcC5jb20NCj4gDQo+IC0tDQo+IFRv
IHVuc3Vic2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1YnNjcmliZSBs
aW51eC1uZnMiIGluDQo+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5r
ZXJuZWwub3JnDQo+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0dHA6Ly92Z2VyLmtlcm5lbC5v
cmcvbWFqb3Jkb21vLWluZm8uaHRtbA0KDQoT77+977+97Lm7HO+/vSbvv71+77+9Ju+/vRjvv73v
v70rLe+/ve+/vd22F++/ve+/vXfvv73vv73Lm++/ve+/ve+/vW3vv71i77+977+9Z37Ip++/vRfv
v73vv73cqH3vv73vv73vv73GoHrvv70majordu+/ve+/ve+/vQfvv73vv73vv73vv716Wivvv73v
v70rembvv73vv73vv71o77+977+977+9fu+/ve+/ve+/ve+/vWnvv73vv73vv71677+9Hu+/vXfv
v73vv73vv70/77+977+977+977+9Ju+/vSnfohtm

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v3 23/25] pnfsblock: use pageio_ops api
  2011-07-28  2:27     ` tao.peng
@ 2011-07-28  3:52       ` Trond Myklebust
  2011-07-28  4:10         ` tao.peng
  0 siblings, 1 reply; 55+ messages in thread
From: Trond Myklebust @ 2011-07-28  3:52 UTC (permalink / raw)
  To: tao.peng; +Cc: rees, linux-nfs, honey

On Wed, 2011-07-27 at 22:27 -0400, tao.peng@emc.com wrote: 
> > -----Original Message-----
> > From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs-owner@vger.kernel.org]
> > On Behalf Of Trond Myklebust
> > Sent: Thursday, July 28, 2011 4:15 AM
> > To: Jim Rees
> > Cc: linux-nfs@vger.kernel.org; peter honeyman
> > Subject: Re: [PATCH v3 23/25] pnfsblock: use pageio_ops api
> > 
> > On Wed, 2011-07-27 at 14:40 -0400, Jim Rees wrote:
> > > From: Benny Halevy <bhalevy@panasas.com>
> > >
> > > [pnfsblock: use pnfs_generic_pg_init_read/write]
> > > Signed-off-by: Peng Tao <peng_tao@emc.com>
> > > Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> > > ---
> > >  fs/nfs/blocklayout/blocklayout.c |   14 ++++++++++++++
> > >  1 files changed, 14 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> > > index 22de05c..502d384 100644
> > > --- a/fs/nfs/blocklayout/blocklayout.c
> > > +++ b/fs/nfs/blocklayout/blocklayout.c
> > > @@ -762,6 +762,18 @@ bl_clear_layoutdriver(struct nfs_server *server)
> > >  	return 0;
> > >  }
> > >
> > > +static const struct nfs_pageio_ops bl_pg_read_ops = {
> > > +	.pg_init = pnfs_generic_pg_init_read,
> > > +	.pg_test = pnfs_generic_pg_test,
> > > +	.pg_doio = pnfs_generic_pg_readpages,
> > > +};
> > > +
> > > +static const struct nfs_pageio_ops bl_pg_write_ops = {
> > > +	.pg_init = pnfs_generic_pg_init_write,
> > > +	.pg_test = pnfs_generic_pg_test,
> > > +	.pg_doio = pnfs_generic_pg_writepages,
> > > +};
> > > +
> > >  static struct pnfs_layoutdriver_type blocklayout_type = {
> > >  	.id				= LAYOUT_BLOCK_VOLUME,
> > >  	.name				= "LAYOUT_BLOCK_VOLUME",
> > > @@ -775,6 +787,8 @@ static struct pnfs_layoutdriver_type blocklayout_type = {
> > >  	.cleanup_layoutcommit		= bl_cleanup_layoutcommit,
> > >  	.set_layoutdriver		= bl_set_layoutdriver,
> > >  	.clear_layoutdriver		= bl_clear_layoutdriver,
> > > +	.pg_read_ops			= &bl_pg_read_ops,
> > > +	.pg_write_ops			= &bl_pg_write_ops,
> > >  };
> > >
> > How did this work previously?
> block layout used to use .pg_test interface in pnfs_layoutdriver_type but it got removed in the 3.0 merge window.

Right, but won't the entire blocks interface blow up in your face until
you merge this patch? If it just falls back to write through MDS, then
I'm fine, but my worry is that someone who git bisects in order to track
down a patch may end up seeing new panics due to the lack of a
well-defined struct nfs_pageio_ops in blocks.

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v3 23/25] pnfsblock: use pageio_ops api
  2011-07-28  3:52       ` Trond Myklebust
@ 2011-07-28  4:10         ` tao.peng
  0 siblings, 0 replies; 55+ messages in thread
From: tao.peng @ 2011-07-28  4:10 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: rees, linux-nfs, honey

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBsaW51eC1uZnMtb3duZXJAdmdl
ci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9yZ10NCj4g
T24gQmVoYWxmIE9mIFRyb25kIE15a2xlYnVzdA0KPiBTZW50OiBUaHVyc2RheSwgSnVseSAyOCwg
MjAxMSAxMTo1MyBBTQ0KPiBUbzogUGVuZywgVGFvDQo+IENjOiByZWVzQHVtaWNoLmVkdTsgbGlu
dXgtbmZzQHZnZXIua2VybmVsLm9yZzsgaG9uZXlAY2l0aS51bWljaC5lZHUNCj4gU3ViamVjdDog
UkU6IFtQQVRDSCB2MyAyMy8yNV0gcG5mc2Jsb2NrOiB1c2UgcGFnZWlvX29wcyBhcGkNCj4gDQo+
IE9uIFdlZCwgMjAxMS0wNy0yNyBhdCAyMjoyNyAtMDQwMCwgdGFvLnBlbmdAZW1jLmNvbSB3cm90
ZToNCj4gPiA+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+ID4gPiBGcm9tOiBsaW51eC1u
ZnMtb3duZXJAdmdlci5rZXJuZWwub3JnDQo+IFttYWlsdG86bGludXgtbmZzLW93bmVyQHZnZXIu
a2VybmVsLm9yZ10NCj4gPiA+IE9uIEJlaGFsZiBPZiBUcm9uZCBNeWtsZWJ1c3QNCj4gPiA+IFNl
bnQ6IFRodXJzZGF5LCBKdWx5IDI4LCAyMDExIDQ6MTUgQU0NCj4gPiA+IFRvOiBKaW0gUmVlcw0K
PiA+ID4gQ2M6IGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmc7IHBldGVyIGhvbmV5bWFuDQo+ID4g
PiBTdWJqZWN0OiBSZTogW1BBVENIIHYzIDIzLzI1XSBwbmZzYmxvY2s6IHVzZSBwYWdlaW9fb3Bz
IGFwaQ0KPiA+ID4NCj4gPiA+IE9uIFdlZCwgMjAxMS0wNy0yNyBhdCAxNDo0MCAtMDQwMCwgSmlt
IFJlZXMgd3JvdGU6DQo+ID4gPiA+IEZyb206IEJlbm55IEhhbGV2eSA8YmhhbGV2eUBwYW5hc2Fz
LmNvbT4NCj4gPiA+ID4NCj4gPiA+ID4gW3BuZnNibG9jazogdXNlIHBuZnNfZ2VuZXJpY19wZ19p
bml0X3JlYWQvd3JpdGVdDQo+ID4gPiA+IFNpZ25lZC1vZmYtYnk6IFBlbmcgVGFvIDxwZW5nX3Rh
b0BlbWMuY29tPg0KPiA+ID4gPiBTaWduZWQtb2ZmLWJ5OiBCZW5ueSBIYWxldnkgPGJoYWxldnlA
cGFuYXNhcy5jb20+DQo+ID4gPiA+IC0tLQ0KPiA+ID4gPiAgZnMvbmZzL2Jsb2NrbGF5b3V0L2Js
b2NrbGF5b3V0LmMgfCAgIDE0ICsrKysrKysrKysrKysrDQo+ID4gPiA+ICAxIGZpbGVzIGNoYW5n
ZWQsIDE0IGluc2VydGlvbnMoKyksIDAgZGVsZXRpb25zKC0pDQo+ID4gPiA+DQo+ID4gPiA+IGRp
ZmYgLS1naXQgYS9mcy9uZnMvYmxvY2tsYXlvdXQvYmxvY2tsYXlvdXQuYw0KPiBiL2ZzL25mcy9i
bG9ja2xheW91dC9ibG9ja2xheW91dC5jDQo+ID4gPiA+IGluZGV4IDIyZGUwNWMuLjUwMmQzODQg
MTAwNjQ0DQo+ID4gPiA+IC0tLSBhL2ZzL25mcy9ibG9ja2xheW91dC9ibG9ja2xheW91dC5jDQo+
ID4gPiA+ICsrKyBiL2ZzL25mcy9ibG9ja2xheW91dC9ibG9ja2xheW91dC5jDQo+ID4gPiA+IEBA
IC03NjIsNiArNzYyLDE4IEBAIGJsX2NsZWFyX2xheW91dGRyaXZlcihzdHJ1Y3QgbmZzX3NlcnZl
ciAqc2VydmVyKQ0KPiA+ID4gPiAgCXJldHVybiAwOw0KPiA+ID4gPiAgfQ0KPiA+ID4gPg0KPiA+
ID4gPiArc3RhdGljIGNvbnN0IHN0cnVjdCBuZnNfcGFnZWlvX29wcyBibF9wZ19yZWFkX29wcyA9
IHsNCj4gPiA+ID4gKwkucGdfaW5pdCA9IHBuZnNfZ2VuZXJpY19wZ19pbml0X3JlYWQsDQo+ID4g
PiA+ICsJLnBnX3Rlc3QgPSBwbmZzX2dlbmVyaWNfcGdfdGVzdCwNCj4gPiA+ID4gKwkucGdfZG9p
byA9IHBuZnNfZ2VuZXJpY19wZ19yZWFkcGFnZXMsDQo+ID4gPiA+ICt9Ow0KPiA+ID4gPiArDQo+
ID4gPiA+ICtzdGF0aWMgY29uc3Qgc3RydWN0IG5mc19wYWdlaW9fb3BzIGJsX3BnX3dyaXRlX29w
cyA9IHsNCj4gPiA+ID4gKwkucGdfaW5pdCA9IHBuZnNfZ2VuZXJpY19wZ19pbml0X3dyaXRlLA0K
PiA+ID4gPiArCS5wZ190ZXN0ID0gcG5mc19nZW5lcmljX3BnX3Rlc3QsDQo+ID4gPiA+ICsJLnBn
X2RvaW8gPSBwbmZzX2dlbmVyaWNfcGdfd3JpdGVwYWdlcywNCj4gPiA+ID4gK307DQo+ID4gPiA+
ICsNCj4gPiA+ID4gIHN0YXRpYyBzdHJ1Y3QgcG5mc19sYXlvdXRkcml2ZXJfdHlwZSBibG9ja2xh
eW91dF90eXBlID0gew0KPiA+ID4gPiAgCS5pZAkJCQk9IExBWU9VVF9CTE9DS19WT0xVTUUsDQo+
ID4gPiA+ICAJLm5hbWUJCQkJPSAiTEFZT1VUX0JMT0NLX1ZPTFVNRSIsDQo+ID4gPiA+IEBAIC03
NzUsNiArNzg3LDggQEAgc3RhdGljIHN0cnVjdCBwbmZzX2xheW91dGRyaXZlcl90eXBlIGJsb2Nr
bGF5b3V0X3R5cGUNCj4gPSB7DQo+ID4gPiA+ICAJLmNsZWFudXBfbGF5b3V0Y29tbWl0CQk9IGJs
X2NsZWFudXBfbGF5b3V0Y29tbWl0LA0KPiA+ID4gPiAgCS5zZXRfbGF5b3V0ZHJpdmVyCQk9IGJs
X3NldF9sYXlvdXRkcml2ZXIsDQo+ID4gPiA+ICAJLmNsZWFyX2xheW91dGRyaXZlcgkJPSBibF9j
bGVhcl9sYXlvdXRkcml2ZXIsDQo+ID4gPiA+ICsJLnBnX3JlYWRfb3BzCQkJPSAmYmxfcGdfcmVh
ZF9vcHMsDQo+ID4gPiA+ICsJLnBnX3dyaXRlX29wcwkJCT0gJmJsX3BnX3dyaXRlX29wcywNCj4g
PiA+ID4gIH07DQo+ID4gPiA+DQo+ID4gPiBIb3cgZGlkIHRoaXMgd29yayBwcmV2aW91c2x5Pw0K
PiA+IGJsb2NrIGxheW91dCB1c2VkIHRvIHVzZSAucGdfdGVzdCBpbnRlcmZhY2UgaW4gcG5mc19s
YXlvdXRkcml2ZXJfdHlwZSBidXQgaXQgZ290DQo+IHJlbW92ZWQgaW4gdGhlIDMuMCBtZXJnZSB3
aW5kb3cuDQo+IA0KPiBSaWdodCwgYnV0IHdvbid0IHRoZSBlbnRpcmUgYmxvY2tzIGludGVyZmFj
ZSBibG93IHVwIGluIHlvdXIgZmFjZSB1bnRpbA0KPiB5b3UgbWVyZ2UgdGhpcyBwYXRjaD8gSWYg
aXQganVzdCBmYWxscyBiYWNrIHRvIHdyaXRlIHRocm91Z2ggTURTLCB0aGVuDQo+IEknbSBmaW5l
LCBidXQgbXkgd29ycnkgaXMgdGhhdCBzb21lb25lIHdobyBnaXQgYmlzZWN0cyBpbiBvcmRlciB0
byB0cmFjaw0KPiBkb3duIGEgcGF0Y2ggbWF5IGVuZCB1cCBzZWVpbmcgbmV3IHBhbmljcyBkdWUg
dG8gdGhlIGxhY2sgb2YgYQ0KPiB3ZWxsLWRlZmluZWQgc3RydWN0IG5mc19wYWdlaW9fb3BzIGlu
IGJsb2Nrcy4NCkkgc2VlIHlvdXIgcG9pbnQuIFdlIHdpbGwgbW92ZSB0aGlzIHBhdGNoIHRvIGVh
cmxpZXIgcGxhY2Ugc28gaXQgZG9lc24ndCBicmVhayBiaXNlY3QuDQoNClRoYW5rcywNClRhbw0K
DQo+IA0KPiBDaGVlcnMNCj4gICBUcm9uZA0KPiANCj4gLS0NCj4gVHJvbmQgTXlrbGVidXN0DQo+
IExpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KPiANCj4gTmV0QXBwDQo+IFRyb25kLk15a2xl
YnVzdEBuZXRhcHAuY29tDQo+IHd3dy5uZXRhcHAuY29tDQo+IA0KPiAtLQ0KPiBUbyB1bnN1YnNj
cmliZSBmcm9tIHRoaXMgbGlzdDogc2VuZCB0aGUgbGluZSAidW5zdWJzY3JpYmUgbGludXgtbmZz
IiBpbg0KPiB0aGUgYm9keSBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21vQHZnZXIua2VybmVsLm9y
Zw0KPiBNb3JlIG1ham9yZG9tbyBpbmZvIGF0ICBodHRwOi8vdmdlci5rZXJuZWwub3JnL21ham9y
ZG9tby1pbmZvLmh0bWwNCg0KTu+/ve+/ve+/ve+/ve+/vXLvv73vv71577+977+977+9Yu+/vVjv
v73vv73Hp3bvv71e77+9Kd66ey5u77+9K++/ve+/ve+/ve+/vXvvv73vv73vv70i77+977+9Xm7v
v71y77+977+977+9eu+/vRrvv73vv71o77+977+977+977+9Ju+/ve+/vR7vv71H77+977+977+9
aO+/vQMo77+96ZqO77+93aJqIu+/ve+/vRrvv70bbe+/ve+/ve+/ve+/ve+/vXrvv73elu+/ve+/
ve+/vWbvv73vv73vv71o77+977+977+9fu+/vW3vv70=

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 01/25] pnfs: GETDEVICELIST
  2011-07-27 19:33   ` Trond Myklebust
@ 2011-07-28 18:27     ` Benny Halevy
  0 siblings, 0 replies; 55+ messages in thread
From: Benny Halevy @ 2011-07-28 18:27 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jim Rees, linux-nfs, peter honeyman

On 2011-07-27 15:33, Trond Myklebust wrote:
> On Wed, 2011-07-27 at 14:40 -0400, Jim Rees wrote: 
>> From: Andy Adamson <andros@netapp.com>

...

>> @@ -6902,6 +7029,7 @@ struct rpc_procinfo	nfs4_procedures[] = {
>>  	PROC(GET_LEASE_TIME,	enc_get_lease_time,	dec_get_lease_time),
>>  	PROC(RECLAIM_COMPLETE,	enc_reclaim_complete,	dec_reclaim_complete),
>>  	PROC(GETDEVICEINFO,	enc_getdeviceinfo,	dec_getdeviceinfo),
>> +	PROC(GETDEVICELIST,	enc_getdevicelist,	dec_getdevicelist),
> 
> This needs to go at the end of the list of procedures in order to work
> correctly with nfsstat.
> 

FWIW, what matters is the value of NFSPROC4_CLNT_GETDEVICELIST
rather than the order of instantiation of the PROC() macro.
It is defined in end of the respective enum in include/linux/nfs4.h.

Benny

>> 	PROC(LAYOUTGET,		enc_layoutget,		dec_layoutget),
>>  	PROC(LAYOUTCOMMIT,	enc_layoutcommit,	dec_layoutcommit),
>>  	PROC(LAYOUTRETURN,	enc_layoutreturn,	dec_layoutreturn),
> 
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2011-07-28 18:27 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-27 18:40 [PATCH v3 00/25] add block layout driver to pnfs client Jim Rees
2011-07-27 18:40 ` [PATCH v3 01/25] pnfs: GETDEVICELIST Jim Rees
2011-07-27 19:33   ` Trond Myklebust
2011-07-28 18:27     ` Benny Halevy
2011-07-27 18:40 ` [PATCH v3 02/25] pnfs: add set-clear layoutdriver interface Jim Rees
2011-07-27 19:36   ` Trond Myklebust
2011-07-27 18:40 ` [PATCH v3 03/25] NFS41: Let layoutcommit handle multiple segments Jim Rees
2011-07-27 20:05   ` Boaz Harrosh
2011-07-27 20:22     ` Trond Myklebust
2011-07-27 20:33       ` Boaz Harrosh
2011-07-27 20:58         ` Jim Rees
2011-07-27 21:09           ` Boaz Harrosh
2011-07-27 23:40             ` Jim Rees
2011-07-28  0:12               ` Boaz Harrosh
2011-07-28  2:02                 ` tao.peng
2011-07-28  3:19                 ` Jim Rees
2011-07-27 18:40 ` [PATCH v3 04/25] NFS41: save layoutcommit cred after first successful layoutget Jim Rees
2011-07-27 19:47   ` Trond Myklebust
2011-07-28  3:29     ` tao.peng
2011-07-27 18:40 ` [PATCH v3 05/25] pnfs: ask for layout_blksize and save it in nfs_server Jim Rees
2011-07-27 18:40 ` [PATCH v3 06/25] pnfs: cleanup_layoutcommit Jim Rees
2011-07-27 19:53   ` Trond Myklebust
2011-07-27 20:20     ` Boaz Harrosh
2011-07-27 20:25       ` Trond Myklebust
2011-07-27 20:42         ` Boaz Harrosh
2011-07-27 20:59           ` Trond Myklebust
2011-07-27 21:33             ` Boaz Harrosh
2011-07-27 18:40 ` [PATCH v3 07/25] pnfsblock: add blocklayout Kconfig option, Makefile, and stubs Jim Rees
2011-07-27 18:40 ` [PATCH v3 08/25] pnfsblock: basic extent code Jim Rees
2011-07-27 18:40 ` [PATCH v3 09/25] pnfsblock: add device operations Jim Rees
2011-07-27 18:40 ` [PATCH v3 10/25] pnfsblock: remove " Jim Rees
2011-07-27 18:40 ` [PATCH v3 11/25] pnfsblock: lseg alloc and free Jim Rees
2011-07-27 18:40 ` [PATCH v3 12/25] pnfsblock: merge extents Jim Rees
2011-07-27 18:40 ` [PATCH v3 13/25] pnfsblock: call and parse getdevicelist Jim Rees
2011-07-27 18:40 ` [PATCH v3 14/25] pnfsblock: xdr decode pnfs_block_layout4 Jim Rees
2011-07-27 18:40 ` [PATCH v3 15/25] pnfsblock: bl_find_get_extent Jim Rees
2011-07-27 18:40 ` [PATCH v3 16/25] pnfsblock: add extent manipulation functions Jim Rees
2011-07-27 18:40 ` [PATCH v3 17/25] pnfsblock: merge rw extents Jim Rees
2011-07-27 18:40 ` [PATCH v3 18/25] pnfsblock: encode_layoutcommit Jim Rees
2011-07-27 18:40 ` [PATCH v3 19/25] pnfsblock: cleanup_layoutcommit Jim Rees
2011-07-27 18:40 ` [PATCH v3 20/25] pnfsblock: bl_read_pagelist Jim Rees
2011-07-27 20:09   ` Trond Myklebust
2011-07-27 18:40 ` [PATCH v3 21/25] pnfsblock: bl_write_pagelist Jim Rees
2011-07-27 20:11   ` Trond Myklebust
2011-07-27 18:40 ` [PATCH v3 22/25] pnfsblock: note written INVAL areas for layoutcommit Jim Rees
2011-07-27 20:13   ` Trond Myklebust
2011-07-28  2:30     ` tao.peng
2011-07-27 18:40 ` [PATCH v3 23/25] pnfsblock: use pageio_ops api Jim Rees
2011-07-27 20:15   ` Trond Myklebust
2011-07-28  2:27     ` tao.peng
2011-07-28  3:52       ` Trond Myklebust
2011-07-28  4:10         ` tao.peng
2011-07-27 18:40 ` [PATCH v3 24/25] pnfsblock: write_pagelist handle zero invalid extents Jim Rees
2011-07-27 18:40 ` [PATCH v3 25/25] NFS41: Drop lseg ref before fallthru to MDS Jim Rees
2011-07-27 20:16   ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).