All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v5 0/38] pnfs for 2.6.40
@ 2011-05-22 23:43 Benny Halevy
  2011-05-22 23:45 ` [PATCH v5 01/38] NFSv4.1: use struct nfs_client to qualify deviceid Benny Halevy
                   ` (38 more replies)
  0 siblings, 39 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:43 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, NFS list

This version includes review comment fixes on top of Boaz v4 sent yesterday.

In addition, it includes a generic deviceid cache used for both
layout drivers, using the pointers to the layout driver and to
the nfs_client to uniquify the deviceids.

This patchset includes SQUASHME patches for ease of review
however these will be squashed to produce a smaller and cleaner
patchset.

I've tested this code to compile and pass basic/generic/special
cthon tests with the files layout driver but not with the objects
layout yet.  Boaz - I'd appreciate if you could run the pnfs-obj
tests too, thanks!

Changes in v5: (Benny)
* use struct nfs_client to qualify deviceid
* make device cache global
* use be32 res in nfs4_callback_devicenotify
* use gfp_flags
* xdr_init_decode_pages
* fix layout stateid used for layoutreturn args
* commit message for layoutret_on_setattr
* use global deviceid cache for pnfs-obj
  - revert per-mount hook and related {.un}set_layoutdriver methods
* use layout driver in global device cache

changes in v4: (Boaz)
* See the long SQUASHME patchset I sent yesterday for all the changes.
  titled: [PATCHSET 00/13] SQUASHME pnfs-obj: Lots of changes addressing
comments by Trond and Benny
* I have united all 3 raid-engine read/write patches to a single patch
* I've united the two error-reporting and error-encoding into one patch
* Some checkpatch love
* small cleanups here and there.
(I'll send a diff as reply to this mail)

changes in v3: (Benny)
* removed direct i/o patch
* align layoutget requests on page boundaries
* fix lseg ordering
* cleanup pnfs_insert_lseg
* pnfs: clean up pnfs_find_lseg lseg arg
* remove unnecessary FIXME

changes in v2:
* fix CB_NOTIFY_DEVICEID
* call pnfs_return_layout right before pnfs_destroy_layout
* remove assert_spin_locked from pnfs_clear_lseg_list
* remove wait parameter from the layoutreturn path.
* remove return_type field from nfs4_layoutreturn_args
* remove range from nfs4_layoutreturn_args
* no need to send layoutcommit from _pnfs_return_layout
* don't wait on sync layoutreturn
* get rid of PNFS_USE_RPC_CODE
* get rid of __nfs4_write_done_cb
* get rid of ds_[rw]size
* rename pnfs_{read,write}_done -> pnfs_ld_{read,write}_done
* reorganize and reorder the pnfs-obj patchset to expose dependencies
  and separate api changes
* some cleaning up of the pnfs-obj patches
* add xdr space reservation for pnfs-obj opaque layoutreturn
  and layoutcommit payloads

List of patches:

generic:
[PATCH v5 01/38] NFSv4.1: use struct nfs_client to qualify deviceid
[PATCH v5 02/38] pnfs: resolve header dependency in pnfs.h
[PATCH v5 03/38] NFSv4.1: make deviceid cache global
[PATCH v5 04/38] NFSv4.1: purge deviceid cache on nfs_free_client
[PATCH v5 05/38] pnfs: CB_NOTIFY_DEVICEID
[PATCH v5 06/38] SQUASHME: use be32 res in nfs4_callback_devicenotify
[PATCH v5 07/38] SQUASHME: pnfs: use nfs_client to qualify deviceid for
cb_notify_deviceid
[PATCH v5 08/38] SQUASHME: pnfs: use global deviceid cache for
CB_NOTIFY_DEVICEID
[PATCH v5 09/38] SQUASHME: pnfs: refactor device cache _lookup_deviceid
[PATCH v5 10/38] SQUASHME: pnfs: refactor device cache _find_get_deviceid
[PATCH v5 11/38] SUNRPC: introduce xdr_init_decode_pages
[PATCH v5 12/38] pnfs: Use byte-range for layoutget
[PATCH v5 13/38] pnfs: align layoutget requests on page boundaries
[PATCH v5 14/38] pnfs: Use byte-range for cb_layoutrecall
[PATCH v5 15/38] pnfs: client stats

Basic ld driver and some std definitions:
[PATCH v5 16/38] pnfs-obj: objlayoutdriver module skeleton
[PATCH v5 17/38] pnfs-obj: pnfs_osd XDR definitions
[PATCH v5 18/38] pnfs-obj: pnfs_osd XDR client implementation

layoutget:
[PATCH v5 19/38] pnfs-obj: decode layout, alloc/free lseg

getdeviceinfo:
[PATCH v5 20/38] pnfs: per mount layout driver private data
[PATCH v5 21/38] pnfs-obj: objio_osd device information retrieval and
caching
[PATCH v5 22/38] pnfs: set/unset layoutdriver
[PATCH v5 23/38] SQUASHME: pnfs-obj: use global device cache
[PATCH v5 24/38] SQUASHME: Revert "pnfs: per mount layout driver private
data"
[PATCH v5 25/38] SQUASHME: Revert "pnfs: set/unset layoutdriver"
[PATCH v5 26/38] NFSv4.1: use layout driver in global device cache

read/write:
[PATCH v5 27/38] pnfs: alloc and free layout_hdr layoutdriver methods
[PATCH v5 28/38] pnfs-obj: define per-inode private structure
[PATCH v5 29/38] pnfs: support for non-rpc layout drivers
[PATCH v5 30/38] SQUASHME: initialize data->task on the non-rpc io done
success paths
[PATCH v5 31/38] pnfs-obj: osd raid engine read/write implementation

layoutreturn:
[PATCH v5 32/38] pnfs: layoutreturn
[PATCH v5 33/38] SQUASHME: pnfs: fix layout stateid in layoutreturn args
[PATCH v5 34/38] pnfs: layoutret_on_setattr
[PATCH v5 35/38] pnfs: encode_layoutreturn
[PATCH v5 36/38] pnfs-obj: report errors and .encode_layoutreturn
Implementation.

layoutcommit:
[PATCH v5 37/38] pnfs: encode_layoutcommit
[PATCH v5 38/38] pnfs-obj: objlayout_encode_layoutcommit implementation

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v5 01/38] NFSv4.1: use struct nfs_client to qualify deviceid
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
@ 2011-05-22 23:45 ` Benny Halevy
  2011-05-22 23:45 ` [PATCH v5 02/38] pnfs: resolve header dependency in pnfs.h Benny Halevy
                   ` (37 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:45 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

deviceids are unique per server, per layout type.
Therefore, in the global cache in the files layout driver
deviceids from different servers may clash so we need
to qualify them with a struct nfs_client that represents
the nfs server that returned the deviceid.

Introduced in 2.6.39 commit ea8eecdd
"NFSv4.1 move deviceid cache to filelayout driver"

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/nfs4filelayout.c    |    2 +-
 fs/nfs/nfs4filelayout.h    |    3 ++-
 fs/nfs/nfs4filelayoutdev.c |    9 ++++-----
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index be79dc9..ff47fdf 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -440,7 +440,7 @@ filelayout_check_layout(struct pnfs_layout_hdr *lo,
 	}
 
 	/* find and reference the deviceid */
-	dsaddr = nfs4_fl_find_get_deviceid(id);
+	dsaddr = nfs4_fl_find_get_deviceid(NFS_SERVER(lo->plh_inode)->nfs_client, id);
 	if (dsaddr == NULL) {
 		dsaddr = get_device_info(lo->plh_inode, id, gfp_flags);
 		if (dsaddr == NULL)
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 2b461d7..301b955 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -60,6 +60,7 @@ struct nfs4_pnfs_ds {
 
 struct nfs4_file_layout_dsaddr {
 	struct hlist_node		node;
+	struct nfs_client		*nfs_client;
 	struct nfs4_deviceid		deviceid;
 	atomic_t			ref;
 	unsigned long			flags;
@@ -101,7 +102,7 @@ u32 nfs4_fl_calc_ds_index(struct pnfs_layout_segment *lseg, u32 j);
 struct nfs4_pnfs_ds *nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg,
 					u32 ds_idx);
 extern struct nfs4_file_layout_dsaddr *
-nfs4_fl_find_get_deviceid(struct nfs4_deviceid *dev_id);
+nfs4_fl_find_get_deviceid(struct nfs_client *, struct nfs4_deviceid *dev_id);
 extern void nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
 struct nfs4_file_layout_dsaddr *
 get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id, gfp_t gfp_flags);
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index db07c7a..42e3266 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -431,7 +431,7 @@ decode_device(struct inode *ino, struct pnfs_device *pdev, gfp_t gfp_flags)
 	dsaddr->stripe_indices = stripe_indices;
 	stripe_indices = NULL;
 	dsaddr->ds_num = num;
-
+	dsaddr->nfs_client = NFS_SERVER(ino)->nfs_client;
 	memcpy(&dsaddr->deviceid, &pdev->dev_id, sizeof(pdev->dev_id));
 
 	for (i = 0; i < dsaddr->ds_num; i++) {
@@ -516,7 +516,7 @@ decode_and_add_device(struct inode *inode, struct pnfs_device *dev, gfp_t gfp_fl
 	}
 
 	spin_lock(&filelayout_deviceid_lock);
-	d = nfs4_fl_find_get_deviceid(&new->deviceid);
+	d = nfs4_fl_find_get_deviceid(new->nfs_client, &new->deviceid);
 	if (d) {
 		spin_unlock(&filelayout_deviceid_lock);
 		nfs4_fl_free_deviceid(new);
@@ -610,16 +610,15 @@ nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
 }
 
 struct nfs4_file_layout_dsaddr *
-nfs4_fl_find_get_deviceid(struct nfs4_deviceid *id)
+nfs4_fl_find_get_deviceid(struct nfs_client *clp, struct nfs4_deviceid *id)
 {
 	struct nfs4_file_layout_dsaddr *d;
 	struct hlist_node *n;
 	long hash = nfs4_fl_deviceid_hash(id);
 
-
 	rcu_read_lock();
 	hlist_for_each_entry_rcu(d, n, &filelayout_deviceid_cache[hash], node) {
-		if (!memcmp(&d->deviceid, id, sizeof(*id))) {
+		if (d->nfs_client == clp && !memcmp(&d->deviceid, id, sizeof(*id))) {
 			if (!atomic_inc_not_zero(&d->ref))
 				goto fail;
 			rcu_read_unlock();
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 02/38] pnfs: resolve header dependency in pnfs.h
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
  2011-05-22 23:45 ` [PATCH v5 01/38] NFSv4.1: use struct nfs_client to qualify deviceid Benny Halevy
@ 2011-05-22 23:45 ` Benny Halevy
  2011-05-22 23:45 ` [PATCH v5 03/38] NFSv4.1: make deviceid cache global Benny Halevy
                   ` (36 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:45 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

Some definitions in the header file depend on nfs_fs.h so pnfs.h can't
be included independently.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/pnfs.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 0c015ba..720bb9d 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -30,6 +30,7 @@
 #ifndef FS_NFS_PNFS_H
 #define FS_NFS_PNFS_H
 
+#include <linux/nfs_fs.h>
 #include <linux/nfs_page.h>
 
 enum {
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 03/38] NFSv4.1: make deviceid cache global
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
  2011-05-22 23:45 ` [PATCH v5 01/38] NFSv4.1: use struct nfs_client to qualify deviceid Benny Halevy
  2011-05-22 23:45 ` [PATCH v5 02/38] pnfs: resolve header dependency in pnfs.h Benny Halevy
@ 2011-05-22 23:45 ` Benny Halevy
  2011-05-22 23:46 ` [PATCH v5 04/38] NFSv4.1: purge deviceid cache on nfs_free_client Benny Halevy
                   ` (35 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:45 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

Move deviceid cache from the pnfs files layout driver to the
generic layer in preparation for the objects layout driver.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/Makefile            |    2 +-
 fs/nfs/nfs4filelayout.c    |   10 ++-
 fs/nfs/nfs4filelayout.h    |    8 +--
 fs/nfs/nfs4filelayoutdev.c |  104 ++++-------------------------
 fs/nfs/pnfs.h              |   17 +++++
 fs/nfs/pnfs_dev.c          |  156 ++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 194 insertions(+), 103 deletions(-)
 create mode 100644 fs/nfs/pnfs_dev.c

diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 4776ff9..7516a8a 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -15,7 +15,7 @@ nfs-$(CONFIG_NFS_V4)	+= nfs4proc.o nfs4xdr.o nfs4state.o nfs4renewd.o \
 			   delegation.o idmap.o \
 			   callback.o callback_xdr.o callback_proc.o \
 			   nfs4namespace.o
-nfs-$(CONFIG_NFS_V4_1)	+= pnfs.o
+nfs-$(CONFIG_NFS_V4_1)	+= pnfs.o pnfs_dev.o
 nfs-$(CONFIG_SYSCTL) += sysctl.o
 nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-index.o
 
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index ff47fdf..c8e7afe 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -421,6 +421,7 @@ filelayout_check_layout(struct pnfs_layout_hdr *lo,
 			struct nfs4_deviceid *id,
 			gfp_t gfp_flags)
 {
+	struct nfs4_deviceid_node *d;
 	struct nfs4_file_layout_dsaddr *dsaddr;
 	int status = -EINVAL;
 	struct nfs_server *nfss = NFS_SERVER(lo->plh_inode);
@@ -440,12 +441,13 @@ filelayout_check_layout(struct pnfs_layout_hdr *lo,
 	}
 
 	/* find and reference the deviceid */
-	dsaddr = nfs4_fl_find_get_deviceid(NFS_SERVER(lo->plh_inode)->nfs_client, id);
-	if (dsaddr == NULL) {
+	d = nfs4_find_get_deviceid(NFS_SERVER(lo->plh_inode)->nfs_client, id);
+	if (d == NULL) {
 		dsaddr = get_device_info(lo->plh_inode, id, gfp_flags);
 		if (dsaddr == NULL)
 			goto out;
-	}
+	} else
+		dsaddr = container_of(d, struct nfs4_file_layout_dsaddr, id_node);
 	fl->dsaddr = dsaddr;
 
 	if (fl->first_stripe_index < 0 ||
@@ -535,7 +537,7 @@ filelayout_decode_layout(struct pnfs_layout_hdr *flo,
 
 	memcpy(id, p, sizeof(*id));
 	p += XDR_QUADLEN(NFS4_DEVICEID4_SIZE);
-	print_deviceid(id);
+	nfs4_print_deviceid(id);
 
 	nfl_util = be32_to_cpup(p++);
 	if (nfl_util & NFL4_UFLG_COMMIT_THRU_MDS)
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 301b955..0ace0a2 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -59,10 +59,7 @@ struct nfs4_pnfs_ds {
 #define NFS4_DEVICE_ID_NEG_ENTRY	0x00000001
 
 struct nfs4_file_layout_dsaddr {
-	struct hlist_node		node;
-	struct nfs_client		*nfs_client;
-	struct nfs4_deviceid		deviceid;
-	atomic_t			ref;
+	struct nfs4_deviceid_node	id_node;
 	unsigned long			flags;
 	u32				stripe_count;
 	u8				*stripe_indices;
@@ -96,13 +93,10 @@ extern struct nfs_fh *
 nfs4_fl_select_ds_fh(struct pnfs_layout_segment *lseg, u32 j);
 
 extern void print_ds(struct nfs4_pnfs_ds *ds);
-extern void print_deviceid(struct nfs4_deviceid *dev_id);
 u32 nfs4_fl_calc_j_index(struct pnfs_layout_segment *lseg, loff_t offset);
 u32 nfs4_fl_calc_ds_index(struct pnfs_layout_segment *lseg, u32 j);
 struct nfs4_pnfs_ds *nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg,
 					u32 ds_idx);
-extern struct nfs4_file_layout_dsaddr *
-nfs4_fl_find_get_deviceid(struct nfs_client *, struct nfs4_deviceid *dev_id);
 extern void nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
 struct nfs4_file_layout_dsaddr *
 get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id, gfp_t gfp_flags);
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index 42e3266..eda4527 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -37,30 +37,6 @@
 #define NFSDBG_FACILITY		NFSDBG_PNFS_LD
 
 /*
- * Device ID RCU cache. A device ID is unique per client ID and layout type.
- */
-#define NFS4_FL_DEVICE_ID_HASH_BITS	5
-#define NFS4_FL_DEVICE_ID_HASH_SIZE	(1 << NFS4_FL_DEVICE_ID_HASH_BITS)
-#define NFS4_FL_DEVICE_ID_HASH_MASK	(NFS4_FL_DEVICE_ID_HASH_SIZE - 1)
-
-static inline u32
-nfs4_fl_deviceid_hash(struct nfs4_deviceid *id)
-{
-	unsigned char *cptr = (unsigned char *)id->data;
-	unsigned int nbytes = NFS4_DEVICEID4_SIZE;
-	u32 x = 0;
-
-	while (nbytes--) {
-		x *= 37;
-		x += *cptr++;
-	}
-	return x & NFS4_FL_DEVICE_ID_HASH_MASK;
-}
-
-static struct hlist_head filelayout_deviceid_cache[NFS4_FL_DEVICE_ID_HASH_SIZE];
-static DEFINE_SPINLOCK(filelayout_deviceid_lock);
-
-/*
  * Data server cache
  *
  * Data servers can be mapped to different device ids.
@@ -89,27 +65,6 @@ print_ds(struct nfs4_pnfs_ds *ds)
 		ds->ds_clp ? ds->ds_clp->cl_exchange_flags : 0);
 }
 
-void
-print_ds_list(struct nfs4_file_layout_dsaddr *dsaddr)
-{
-	int i;
-
-	ifdebug(FACILITY) {
-		printk("%s dsaddr->ds_num %d\n", __func__,
-		       dsaddr->ds_num);
-		for (i = 0; i < dsaddr->ds_num; i++)
-			print_ds(dsaddr->ds_list[i]);
-	}
-}
-
-void print_deviceid(struct nfs4_deviceid *id)
-{
-	u32 *p = (u32 *)id;
-
-	dprintk("%s: device id= [%x%x%x%x]\n", __func__,
-		p[0], p[1], p[2], p[3]);
-}
-
 /* nfs4_ds_cache_lock is held */
 static struct nfs4_pnfs_ds *
 _data_server_lookup_locked(u32 ip_addr, u32 port)
@@ -207,7 +162,7 @@ nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
 	struct nfs4_pnfs_ds *ds;
 	int i;
 
-	print_deviceid(&dsaddr->deviceid);
+	nfs4_print_deviceid(&dsaddr->id_node.deviceid);
 
 	for (i = 0; i < dsaddr->ds_num; i++) {
 		ds = dsaddr->ds_list[i];
@@ -431,8 +386,8 @@ decode_device(struct inode *ino, struct pnfs_device *pdev, gfp_t gfp_flags)
 	dsaddr->stripe_indices = stripe_indices;
 	stripe_indices = NULL;
 	dsaddr->ds_num = num;
-	dsaddr->nfs_client = NFS_SERVER(ino)->nfs_client;
-	memcpy(&dsaddr->deviceid, &pdev->dev_id, sizeof(pdev->dev_id));
+	nfs4_init_deviceid_node(&dsaddr->id_node, NFS_SERVER(ino)->nfs_client,
+				&pdev->dev_id);
 
 	for (i = 0; i < dsaddr->ds_num; i++) {
 		int j;
@@ -505,8 +460,8 @@ out_err:
 static struct nfs4_file_layout_dsaddr *
 decode_and_add_device(struct inode *inode, struct pnfs_device *dev, gfp_t gfp_flags)
 {
-	struct nfs4_file_layout_dsaddr *d, *new;
-	long hash;
+	struct nfs4_deviceid_node *d;
+	struct nfs4_file_layout_dsaddr *n, *new;
 
 	new = decode_device(inode, dev, gfp_flags);
 	if (!new) {
@@ -515,20 +470,13 @@ decode_and_add_device(struct inode *inode, struct pnfs_device *dev, gfp_t gfp_fl
 		return NULL;
 	}
 
-	spin_lock(&filelayout_deviceid_lock);
-	d = nfs4_fl_find_get_deviceid(new->nfs_client, &new->deviceid);
-	if (d) {
-		spin_unlock(&filelayout_deviceid_lock);
+	d = nfs4_insert_deviceid_node(&new->id_node);
+	n = container_of(d, struct nfs4_file_layout_dsaddr, id_node);
+	if (n != new) {
 		nfs4_fl_free_deviceid(new);
-		return d;
+		return n;
 	}
 
-	INIT_HLIST_NODE(&new->node);
-	atomic_set(&new->ref, 1);
-	hash = nfs4_fl_deviceid_hash(&new->deviceid);
-	hlist_add_head_rcu(&new->node, &filelayout_deviceid_cache[hash]);
-	spin_unlock(&filelayout_deviceid_lock);
-
 	return new;
 }
 
@@ -600,34 +548,8 @@ out_free:
 void
 nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
 {
-	if (atomic_dec_and_lock(&dsaddr->ref, &filelayout_deviceid_lock)) {
-		hlist_del_rcu(&dsaddr->node);
-		spin_unlock(&filelayout_deviceid_lock);
-
-		synchronize_rcu();
+	if (nfs4_put_deviceid_node(&dsaddr->id_node))
 		nfs4_fl_free_deviceid(dsaddr);
-	}
-}
-
-struct nfs4_file_layout_dsaddr *
-nfs4_fl_find_get_deviceid(struct nfs_client *clp, struct nfs4_deviceid *id)
-{
-	struct nfs4_file_layout_dsaddr *d;
-	struct hlist_node *n;
-	long hash = nfs4_fl_deviceid_hash(id);
-
-	rcu_read_lock();
-	hlist_for_each_entry_rcu(d, n, &filelayout_deviceid_cache[hash], node) {
-		if (d->nfs_client == clp && !memcmp(&d->deviceid, id, sizeof(*id))) {
-			if (!atomic_inc_not_zero(&d->ref))
-				goto fail;
-			rcu_read_unlock();
-			return d;
-		}
-	}
-fail:
-	rcu_read_unlock();
-	return NULL;
 }
 
 /*
@@ -675,15 +597,15 @@ static void
 filelayout_mark_devid_negative(struct nfs4_file_layout_dsaddr *dsaddr,
 			       int err, u32 ds_addr)
 {
-	u32 *p = (u32 *)&dsaddr->deviceid;
+	u32 *p = (u32 *)&dsaddr->id_node.deviceid;
 
 	printk(KERN_ERR "NFS: data server %x connection error %d."
 		" Deviceid [%x%x%x%x] marked out of use.\n",
 		ds_addr, err, p[0], p[1], p[2], p[3]);
 
-	spin_lock(&filelayout_deviceid_lock);
+	spin_lock(&nfs4_ds_cache_lock);
 	dsaddr->flags |= NFS4_DEVICE_ID_NEG_ENTRY;
-	spin_unlock(&filelayout_deviceid_lock);
+	spin_unlock(&nfs4_ds_cache_lock);
 }
 
 struct nfs4_pnfs_ds *
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 720bb9d..3831ad0 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -157,6 +157,23 @@ bool pnfs_roc_drain(struct inode *ino, u32 *barrier);
 void pnfs_set_layoutcommit(struct nfs_write_data *wdata);
 int pnfs_layoutcommit_inode(struct inode *inode, bool sync);
 
+/* pnfs_dev.c */
+struct nfs4_deviceid_node {
+	struct hlist_node		node;
+	const struct nfs_client		*nfs_client;
+	struct nfs4_deviceid		deviceid;
+	atomic_t			ref;
+};
+
+void nfs4_print_deviceid(const struct nfs4_deviceid *dev_id);
+struct nfs4_deviceid_node *nfs4_find_get_deviceid(const struct nfs_client *, const struct nfs4_deviceid *);
+struct nfs4_deviceid_node *nfs4_unhash_put_deviceid(const struct nfs_client *, const struct nfs4_deviceid *);
+void nfs4_init_deviceid_node(struct nfs4_deviceid_node *,
+			     const struct nfs_client *,
+			     const struct nfs4_deviceid *);
+struct nfs4_deviceid_node *nfs4_insert_deviceid_node(struct nfs4_deviceid_node *);
+bool nfs4_put_deviceid_node(struct nfs4_deviceid_node *);
+
 static inline int lo_fail_bit(u32 iomode)
 {
 	return iomode == IOMODE_RW ?
diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
new file mode 100644
index 0000000..bf05189
--- /dev/null
+++ b/fs/nfs/pnfs_dev.c
@@ -0,0 +1,156 @@
+/*
+ *  Device operations for the pnfs client.
+ *
+ *  Copyright (c) 2002
+ *  The Regents of the University of Michigan
+ *  All Rights Reserved
+ *
+ *  Dean Hildebrand <dhildebz@umich.edu>
+ *  Garth Goodson   <Garth.Goodson@netapp.com>
+ *
+ *  Permission is granted to use, copy, create derivative works, and
+ *  redistribute this software and such derivative works for any purpose,
+ *  so long as the name of the University of Michigan is not used in
+ *  any advertising or publicity pertaining to the use or distribution
+ *  of this software without specific, written prior authorization. If
+ *  the above copyright notice or any other identification of the
+ *  University of Michigan is included in any copy of any portion of
+ *  this software, then the disclaimer below must also be included.
+ *
+ *  This software is provided as is, without representation or warranty
+ *  of any kind either express or implied, including without limitation
+ *  the implied warranties of merchantability, fitness for a particular
+ *  purpose, or noninfringement.  The Regents of the University of
+ *  Michigan shall not be liable for any damages, including special,
+ *  indirect, incidental, or consequential damages, with respect to any
+ *  claim arising out of or in connection with the use of the software,
+ *  even if it has been or is hereafter advised of the possibility of
+ *  such damages.
+ */
+
+#include "pnfs.h"
+
+#define NFSDBG_FACILITY		NFSDBG_PNFS
+
+/*
+ * Device ID RCU cache. A device ID is unique per server and layout type.
+ */
+#define NFS4_DEVICE_ID_HASH_BITS	5
+#define NFS4_DEVICE_ID_HASH_SIZE	(1 << NFS4_DEVICE_ID_HASH_BITS)
+#define NFS4_DEVICE_ID_HASH_MASK	(NFS4_DEVICE_ID_HASH_SIZE - 1)
+
+static struct hlist_head nfs4_deviceid_cache[NFS4_DEVICE_ID_HASH_SIZE];
+static DEFINE_SPINLOCK(nfs4_deviceid_lock);
+
+void
+nfs4_print_deviceid(const struct nfs4_deviceid *id)
+{
+	u32 *p = (u32 *)id;
+
+	dprintk("%s: device id= [%x%x%x%x]\n", __func__,
+		p[0], p[1], p[2], p[3]);
+}
+EXPORT_SYMBOL_GPL(nfs4_print_deviceid);
+
+static inline u32
+nfs4_deviceid_hash(const struct nfs4_deviceid *id)
+{
+	unsigned char *cptr = (unsigned char *)id->data;
+	unsigned int nbytes = NFS4_DEVICEID4_SIZE;
+	u32 x = 0;
+
+	while (nbytes--) {
+		x *= 37;
+		x += *cptr++;
+	}
+	return x & NFS4_DEVICE_ID_HASH_MASK;
+}
+
+/*
+ * Lookup a deviceid in cache and get a reference count on it if found
+ *
+ * @clp nfs_client associated with deviceid
+ * @id deviceid to look up
+ */
+struct nfs4_deviceid_node *
+nfs4_find_get_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id)
+{
+	struct nfs4_deviceid_node *d;
+	struct hlist_node *n;
+	long hash = nfs4_deviceid_hash(id);
+
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(d, n, &nfs4_deviceid_cache[hash], node) {
+		if (d->nfs_client == clp && !memcmp(&d->deviceid, id, sizeof(*id))) {
+			if (!atomic_inc_not_zero(&d->ref))
+				goto fail;
+			rcu_read_unlock();
+			return d;
+		}
+	}
+fail:
+	rcu_read_unlock();
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(nfs4_find_get_deviceid);
+
+void
+nfs4_init_deviceid_node(struct nfs4_deviceid_node *d,
+			const struct nfs_client *nfs_client,
+			const struct nfs4_deviceid *id)
+{
+	d->nfs_client = nfs_client;
+	d->deviceid = *id;
+}
+EXPORT_SYMBOL_GPL(nfs4_init_deviceid_node);
+
+/*
+ * Uniquely initialize and insert a deviceid node into cache
+ *
+ * @new new deviceid node
+ *      Note that the caller must set up new->nfs_client and new->deviceid
+ *
+ * @ret the inserted node, if none found, otherwise, the found entry.
+ */
+struct nfs4_deviceid_node *
+nfs4_insert_deviceid_node(struct nfs4_deviceid_node *new)
+{
+	struct nfs4_deviceid_node *d;
+	long hash;
+
+	spin_lock(&nfs4_deviceid_lock);
+	d = nfs4_find_get_deviceid(new->nfs_client, &new->deviceid);
+	if (d) {
+		spin_unlock(&nfs4_deviceid_lock);
+		return d;
+	}
+
+	INIT_HLIST_NODE(&new->node);
+	atomic_set(&new->ref, 1);
+	hash = nfs4_deviceid_hash(&new->deviceid);
+	hlist_add_head_rcu(&new->node, &nfs4_deviceid_cache[hash]);
+	spin_unlock(&nfs4_deviceid_lock);
+
+	return new;
+}
+EXPORT_SYMBOL_GPL(nfs4_insert_deviceid_node);
+
+/*
+ * Dereference a deviceid node and delete it when its reference count drops
+ * to zero.
+ *
+ * @d deviceid node to put
+ *
+ * @ret true iff the node was deleted
+ */
+bool
+nfs4_put_deviceid_node(struct nfs4_deviceid_node *d)
+{
+	if (!atomic_dec_and_lock(&d->ref, &nfs4_deviceid_lock))
+		return false;
+	hlist_del_init_rcu(&d->node);
+	spin_unlock(&nfs4_deviceid_lock);
+	synchronize_rcu();
+	return true;
+}
+EXPORT_SYMBOL_GPL(nfs4_put_deviceid_node);
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 04/38] NFSv4.1: purge deviceid cache on nfs_free_client
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (2 preceding siblings ...)
  2011-05-22 23:45 ` [PATCH v5 03/38] NFSv4.1: make deviceid cache global Benny Halevy
@ 2011-05-22 23:46 ` Benny Halevy
  2011-05-22 23:46 ` [PATCH v5 05/38] pnfs: CB_NOTIFY_DEVICEID Benny Halevy
                   ` (34 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:46 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

Use the pnfs_layoutdriver_type both as a qualifier for the deviceid,
distinguishing deviceid from different layout types on the server,
and for freeing the layout-driver allocated structure containing the
nfs4_deviceid_node.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/client.c            |    2 +
 fs/nfs/nfs4filelayout.c    |    7 +++++
 fs/nfs/nfs4filelayout.h    |    1 +
 fs/nfs/nfs4filelayoutdev.c |    9 ++++---
 fs/nfs/pnfs.h              |   11 +++++++++
 fs/nfs/pnfs_dev.c          |   53 +++++++++++++++++++++++++++++++++++++++++--
 6 files changed, 76 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 139be96..b3dc2b8 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -290,6 +290,8 @@ static void nfs_free_client(struct nfs_client *clp)
 	if (clp->cl_machine_cred != NULL)
 		put_rpccred(clp->cl_machine_cred);
 
+	nfs4_deviceid_purge_client(clp);
+
 	kfree(clp->cl_hostname);
 	kfree(clp);
 
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index c8e7afe..c181a8b 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -862,6 +862,12 @@ filelayout_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
 	return -ENOMEM;
 }
 
+static void
+filelayout_free_deveiceid_node(struct nfs4_deviceid_node *d)
+{
+	nfs4_fl_free_deviceid(container_of(d, struct nfs4_file_layout_dsaddr, id_node));
+}
+
 static struct pnfs_layoutdriver_type filelayout_type = {
 	.id			= LAYOUT_NFSV4_1_FILES,
 	.name			= "LAYOUT_NFSV4_1_FILES",
@@ -874,6 +880,7 @@ static struct pnfs_layoutdriver_type filelayout_type = {
 	.commit_pagelist	= filelayout_commit_pagelist,
 	.read_pagelist		= filelayout_read_pagelist,
 	.write_pagelist		= filelayout_write_pagelist,
+	.free_deviceid_node	= filelayout_free_deveiceid_node,
 };
 
 static int __init nfs4filelayout_init(void)
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 0ace0a2..cebe01e 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -98,6 +98,7 @@ u32 nfs4_fl_calc_ds_index(struct pnfs_layout_segment *lseg, u32 j);
 struct nfs4_pnfs_ds *nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg,
 					u32 ds_idx);
 extern void nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
+extern void nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
 struct nfs4_file_layout_dsaddr *
 get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id, gfp_t gfp_flags);
 
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index eda4527..5914659 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -156,7 +156,7 @@ destroy_ds(struct nfs4_pnfs_ds *ds)
 	kfree(ds);
 }
 
-static void
+void
 nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
 {
 	struct nfs4_pnfs_ds *ds;
@@ -386,7 +386,9 @@ decode_device(struct inode *ino, struct pnfs_device *pdev, gfp_t gfp_flags)
 	dsaddr->stripe_indices = stripe_indices;
 	stripe_indices = NULL;
 	dsaddr->ds_num = num;
-	nfs4_init_deviceid_node(&dsaddr->id_node, NFS_SERVER(ino)->nfs_client,
+	nfs4_init_deviceid_node(&dsaddr->id_node,
+				NFS_SERVER(ino)->pnfs_curr_ld,
+				NFS_SERVER(ino)->nfs_client,
 				&pdev->dev_id);
 
 	for (i = 0; i < dsaddr->ds_num; i++) {
@@ -548,8 +550,7 @@ out_free:
 void
 nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
 {
-	if (nfs4_put_deviceid_node(&dsaddr->id_node))
-		nfs4_fl_free_deviceid(dsaddr);
+	nfs4_put_deviceid_node(&dsaddr->id_node);
 }
 
 /*
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 3831ad0..9667a62 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -65,6 +65,8 @@ enum {
 	NFS_LAYOUT_DESTROYED,		/* no new use of layout allowed */
 };
 
+struct nfs4_deviceid_node;
+
 /* Per-layout driver specific registration structure */
 struct pnfs_layoutdriver_type {
 	struct list_head pnfs_tblid;
@@ -90,6 +92,8 @@ struct pnfs_layoutdriver_type {
 	 */
 	enum pnfs_try_status (*read_pagelist) (struct nfs_read_data *nfs_data);
 	enum pnfs_try_status (*write_pagelist) (struct nfs_write_data *nfs_data, int how);
+
+	void (*free_deviceid_node) (struct nfs4_deviceid_node *);
 };
 
 struct pnfs_layout_hdr {
@@ -160,6 +164,7 @@ int pnfs_layoutcommit_inode(struct inode *inode, bool sync);
 /* pnfs_dev.c */
 struct nfs4_deviceid_node {
 	struct hlist_node		node;
+	const struct pnfs_layoutdriver_type *ld;
 	const struct nfs_client		*nfs_client;
 	struct nfs4_deviceid		deviceid;
 	atomic_t			ref;
@@ -169,10 +174,12 @@ void nfs4_print_deviceid(const struct nfs4_deviceid *dev_id);
 struct nfs4_deviceid_node *nfs4_find_get_deviceid(const struct nfs_client *, const struct nfs4_deviceid *);
 struct nfs4_deviceid_node *nfs4_unhash_put_deviceid(const struct nfs_client *, const struct nfs4_deviceid *);
 void nfs4_init_deviceid_node(struct nfs4_deviceid_node *,
+			     const struct pnfs_layoutdriver_type *,
 			     const struct nfs_client *,
 			     const struct nfs4_deviceid *);
 struct nfs4_deviceid_node *nfs4_insert_deviceid_node(struct nfs4_deviceid_node *);
 bool nfs4_put_deviceid_node(struct nfs4_deviceid_node *);
+void nfs4_deviceid_purge_client(const struct nfs_client *);
 
 static inline int lo_fail_bit(u32 iomode)
 {
@@ -349,6 +356,10 @@ static inline int pnfs_layoutcommit_inode(struct inode *inode, bool sync)
 {
 	return 0;
 }
+
+static inline void nfs4_deviceid_purge_client(struct nfs_client *)
+{
+}
 #endif /* CONFIG_NFS_V4_1 */
 
 #endif /* FS_NFS_PNFS_H */
diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
index bf05189..3cd7854 100644
--- a/fs/nfs/pnfs_dev.c
+++ b/fs/nfs/pnfs_dev.c
@@ -96,11 +96,15 @@ EXPORT_SYMBOL_GPL(nfs4_find_get_deviceid);
 
 void
 nfs4_init_deviceid_node(struct nfs4_deviceid_node *d,
+			const struct pnfs_layoutdriver_type *ld,
 			const struct nfs_client *nfs_client,
 			const struct nfs4_deviceid *id)
 {
+	INIT_HLIST_NODE(&d->node);
+	d->ld = ld;
 	d->nfs_client = nfs_client;
 	d->deviceid = *id;
+	atomic_set(&d->ref, 1);
 }
 EXPORT_SYMBOL_GPL(nfs4_init_deviceid_node);
 
@@ -108,7 +112,10 @@ EXPORT_SYMBOL_GPL(nfs4_init_deviceid_node);
  * Uniquely initialize and insert a deviceid node into cache
  *
  * @new new deviceid node
- *      Note that the caller must set up new->nfs_client and new->deviceid
+ *      Note that the caller must set up the following members:
+ *        new->ld
+ *        new->nfs_client
+ *        new->deviceid
  *
  * @ret the inserted node, if none found, otherwise, the found entry.
  */
@@ -125,8 +132,6 @@ nfs4_insert_deviceid_node(struct nfs4_deviceid_node *new)
 		return d;
 	}
 
-	INIT_HLIST_NODE(&new->node);
-	atomic_set(&new->ref, 1);
 	hash = nfs4_deviceid_hash(&new->deviceid);
 	hlist_add_head_rcu(&new->node, &nfs4_deviceid_cache[hash]);
 	spin_unlock(&nfs4_deviceid_lock);
@@ -151,6 +156,48 @@ nfs4_put_deviceid_node(struct nfs4_deviceid_node *d)
 	hlist_del_init_rcu(&d->node);
 	spin_unlock(&nfs4_deviceid_lock);
 	synchronize_rcu();
+	if (d->ld->free_deviceid_node)
+		d->ld->free_deviceid_node(d);
 	return true;
 }
 EXPORT_SYMBOL_GPL(nfs4_put_deviceid_node);
+
+static void
+_deviceid_purge_client(const struct nfs_client *clp, long hash)
+{
+	struct nfs4_deviceid_node *d;
+	struct hlist_node *n, *next;
+	HLIST_HEAD(tmp);
+
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(d, n, &nfs4_deviceid_cache[hash], node)
+		if (d->nfs_client == clp && atomic_read(&d->ref)) {
+			hlist_del_init_rcu(&d->node);
+			hlist_add_head(&d->node, &tmp);
+		}
+	rcu_read_unlock();
+
+	if (hlist_empty(&tmp))
+		return;
+
+	synchronize_rcu();
+	hlist_for_each_entry_safe(d, n, next, &tmp, node)
+		if (!atomic_dec_and_test(&d->ref)) {
+			if (d->ld->free_deviceid_node)
+				d->ld->free_deviceid_node(d);
+			else
+				kfree(d);
+		}
+}
+
+void
+nfs4_deviceid_purge_client(const struct nfs_client *clp)
+{
+	long h;
+
+	spin_lock(&nfs4_deviceid_lock);
+	for (h = 0; h < NFS4_DEVICE_ID_HASH_SIZE; h++)
+		_deviceid_purge_client(clp, h);
+	spin_unlock(&nfs4_deviceid_lock);
+}
+EXPORT_SYMBOL_GPL(nfs4_deviceid_purge_client);
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 05/38] pnfs: CB_NOTIFY_DEVICEID
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (3 preceding siblings ...)
  2011-05-22 23:46 ` [PATCH v5 04/38] NFSv4.1: purge deviceid cache on nfs_free_client Benny Halevy
@ 2011-05-22 23:46 ` Benny Halevy
  2011-05-22 23:46 ` [PATCH v5 06/38] SQUASHME: use be32 res in nfs4_callback_devicenotify Benny Halevy
                   ` (33 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:46 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

From: Marc Eshel <eshel@almaden.ibm.com>

Note: This functionlaity is incomplete as all layout segments referring to
the 'to be removed device id' need to be reaped, and all in flight I/O drained.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/callback.h          |   17 ++++++++
 fs/nfs/callback_proc.c     |   51 +++++++++++++++++++++++
 fs/nfs/callback_xdr.c      |   96 +++++++++++++++++++++++++++++++++++++++++++-
 fs/nfs/nfs4filelayout.c    |    1 +
 fs/nfs/nfs4filelayout.h    |    1 +
 fs/nfs/nfs4filelayoutdev.c |   36 ++++++++++++++++
 fs/nfs/pnfs.h              |    1 +
 7 files changed, 202 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h
index 46d93ce..b257383 100644
--- a/fs/nfs/callback.h
+++ b/fs/nfs/callback.h
@@ -167,6 +167,23 @@ extern unsigned nfs4_callback_layoutrecall(
 
 extern void nfs4_check_drain_bc_complete(struct nfs4_session *ses);
 extern void nfs4_cb_take_slot(struct nfs_client *clp);
+
+struct cb_devicenotifyitem {
+	uint32_t		cbd_notify_type;
+	uint32_t		cbd_layout_type;
+	struct nfs4_deviceid	cbd_dev_id;
+	uint32_t		cbd_immediate;
+};
+
+struct cb_devicenotifyargs {
+	int				 ndevs;
+	struct cb_devicenotifyitem	 *devs;
+};
+
+extern __be32 nfs4_callback_devicenotify(
+	struct cb_devicenotifyargs *args,
+	void *dummy, struct cb_process_state *cps);
+
 #endif /* CONFIG_NFS_V4_1 */
 extern int check_gss_callback_principal(struct nfs_client *, struct svc_rqst *);
 extern __be32 nfs4_callback_getattr(struct cb_getattrargs *args,
diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 2f41dcce..975c8f2 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -241,6 +241,57 @@ static void pnfs_recall_all_layouts(struct nfs_client *clp)
 	do_callback_layoutrecall(clp, &args);
 }
 
+__be32 nfs4_callback_devicenotify(struct cb_devicenotifyargs *args,
+				  void *dummy, struct cb_process_state *cps)
+{
+	int i;
+	u32 res = 0;
+	struct nfs_client *clp = cps->clp;
+	struct nfs_server *server = NULL;
+
+	dprintk("%s: -->\n", __func__);
+
+	if (!clp) {
+		res = NFS4ERR_OP_NOT_IN_SESSION;
+		goto out;
+	}
+
+	for (i = 0; i < args->ndevs; i++) {
+		struct cb_devicenotifyitem *dev = &args->devs[i];
+
+		if (!server ||
+		    server->pnfs_curr_ld->id != dev->cbd_layout_type) {
+			rcu_read_lock();
+			list_for_each_entry_rcu(server, &clp->cl_superblocks, client_link)
+				if (server->pnfs_curr_ld &&
+				    server->pnfs_curr_ld->id == dev->cbd_layout_type) {
+					rcu_read_unlock();
+					goto found;
+				}
+			rcu_read_unlock();
+			dprintk("%s: layout type %u not found\n",
+				__func__, dev->cbd_layout_type);
+			continue;
+		}
+
+	found:
+		if (!server->pnfs_curr_ld->delete_deviceid) {
+			res = NFS4ERR_NOTSUPP;
+			break;
+		}
+		if (dev->cbd_notify_type == NOTIFY_DEVICEID4_CHANGE)
+			dprintk("%s: NOTIFY_DEVICEID4_CHANGE not supported, "
+				"deleting instead\n", __func__);
+		server->pnfs_curr_ld->delete_deviceid(&dev->cbd_dev_id);
+	}
+
+out:
+	kfree(args->devs);
+	dprintk("%s: exit with status = %u\n",
+		__func__, res);
+	return cpu_to_be32(res);
+}
+
 int nfs41_validate_delegation_stateid(struct nfs_delegation *delegation, const nfs4_stateid *stateid)
 {
 	if (delegation == NULL)
diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
index 00ecf62..c6c86a7 100644
--- a/fs/nfs/callback_xdr.c
+++ b/fs/nfs/callback_xdr.c
@@ -25,6 +25,7 @@
 
 #if defined(CONFIG_NFS_V4_1)
 #define CB_OP_LAYOUTRECALL_RES_MAXSZ	(CB_OP_HDR_RES_MAXSZ)
+#define CB_OP_DEVICENOTIFY_RES_MAXSZ	(CB_OP_HDR_RES_MAXSZ)
 #define CB_OP_SEQUENCE_RES_MAXSZ	(CB_OP_HDR_RES_MAXSZ + \
 					4 + 1 + 3)
 #define CB_OP_RECALLANY_RES_MAXSZ	(CB_OP_HDR_RES_MAXSZ)
@@ -284,6 +285,93 @@ out:
 	return status;
 }
 
+static
+__be32 decode_devicenotify_args(struct svc_rqst *rqstp,
+				struct xdr_stream *xdr,
+				struct cb_devicenotifyargs *args)
+{
+	__be32 *p;
+	__be32 status = 0;
+	u32 tmp;
+	int n, i;
+	args->ndevs = 0;
+
+	/* Num of device notifications */
+	p = read_buf(xdr, sizeof(uint32_t));
+	if (unlikely(p == NULL)) {
+		status = htonl(NFS4ERR_BADXDR);
+		goto out;
+	}
+	n = ntohl(*p++);
+	if (n <= 0)
+		goto out;
+
+	args->devs = kmalloc(n * sizeof(*args->devs), GFP_KERNEL);
+	if (!args->devs) {
+		status = htonl(NFS4ERR_DELAY);
+		goto out;
+	}
+
+	/* Decode each dev notification */
+	for (i = 0; i < n; i++) {
+		struct cb_devicenotifyitem *dev = &args->devs[i];
+
+		p = read_buf(xdr, (4 * sizeof(uint32_t)) + NFS4_DEVICEID4_SIZE);
+		if (unlikely(p == NULL)) {
+			status = htonl(NFS4ERR_BADXDR);
+			goto err;
+		}
+
+		tmp = ntohl(*p++);	/* bitmap size */
+		if (tmp != 1) {
+			status = htonl(NFS4ERR_INVAL);
+			goto err;
+		}
+		dev->cbd_notify_type = ntohl(*p++);
+		if (dev->cbd_notify_type != NOTIFY_DEVICEID4_CHANGE &&
+		    dev->cbd_notify_type != NOTIFY_DEVICEID4_DELETE) {
+			status = htonl(NFS4ERR_INVAL);
+			goto err;
+		}
+
+		tmp = ntohl(*p++);	/* opaque size */
+		if (((dev->cbd_notify_type == NOTIFY_DEVICEID4_CHANGE) &&
+		     (tmp != NFS4_DEVICEID4_SIZE + 8)) ||
+		    ((dev->cbd_notify_type == NOTIFY_DEVICEID4_DELETE) &&
+		     (tmp != NFS4_DEVICEID4_SIZE + 4))) {
+			status = htonl(NFS4ERR_INVAL);
+			goto err;
+		}
+		dev->cbd_layout_type = ntohl(*p++);
+		memcpy(dev->cbd_dev_id.data, p, NFS4_DEVICEID4_SIZE);
+		p += XDR_QUADLEN(NFS4_DEVICEID4_SIZE);
+
+		if (dev->cbd_layout_type == NOTIFY_DEVICEID4_CHANGE) {
+			p = read_buf(xdr, sizeof(uint32_t));
+			if (unlikely(p == NULL)) {
+				status = htonl(NFS4ERR_BADXDR);
+				goto err;
+			}
+			dev->cbd_immediate = ntohl(*p++);
+		} else {
+			dev->cbd_immediate = 0;
+		}
+
+		args->ndevs++;
+
+		dprintk("%s: type %d layout 0x%x immediate %d\n",
+			__func__, dev->cbd_notify_type, dev->cbd_layout_type,
+			dev->cbd_immediate);
+	}
+out:
+	dprintk("%s: status %d ndevs %d\n",
+		__func__, ntohl(status), args->ndevs);
+	return status;
+err:
+	kfree(args->devs);
+	goto out;
+}
+
 static __be32 decode_sessionid(struct xdr_stream *xdr,
 				 struct nfs4_sessionid *sid)
 {
@@ -639,10 +727,10 @@ preprocess_nfs41_op(int nop, unsigned int op_nr, struct callback_op **op)
 	case OP_CB_RECALL_ANY:
 	case OP_CB_RECALL_SLOT:
 	case OP_CB_LAYOUTRECALL:
+	case OP_CB_NOTIFY_DEVICEID:
 		*op = &callback_ops[op_nr];
 		break;
 
-	case OP_CB_NOTIFY_DEVICEID:
 	case OP_CB_NOTIFY:
 	case OP_CB_PUSH_DELEG:
 	case OP_CB_RECALLABLE_OBJ_AVAIL:
@@ -849,6 +937,12 @@ static struct callback_op callback_ops[] = {
 			(callback_decode_arg_t)decode_layoutrecall_args,
 		.res_maxsize = CB_OP_LAYOUTRECALL_RES_MAXSZ,
 	},
+	[OP_CB_NOTIFY_DEVICEID] = {
+		.process_op = (callback_process_op_t)nfs4_callback_devicenotify,
+		.decode_args =
+			(callback_decode_arg_t)decode_devicenotify_args,
+		.res_maxsize = CB_OP_DEVICENOTIFY_RES_MAXSZ,
+	},
 	[OP_CB_SEQUENCE] = {
 		.process_op = (callback_process_op_t)nfs4_callback_sequence,
 		.decode_args = (callback_decode_arg_t)decode_cb_sequence_args,
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index c181a8b..43284d9 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -881,6 +881,7 @@ static struct pnfs_layoutdriver_type filelayout_type = {
 	.read_pagelist		= filelayout_read_pagelist,
 	.write_pagelist		= filelayout_write_pagelist,
 	.free_deviceid_node	= filelayout_free_deveiceid_node,
+	.delete_deviceid	= filelayout_delete_deviceid,
 };
 
 static int __init nfs4filelayout_init(void)
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index cebe01e..8601656 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -101,5 +101,6 @@ extern void nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
 extern void nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
 struct nfs4_file_layout_dsaddr *
 get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id, gfp_t gfp_flags);
+void filelayout_delete_deviceid(struct nfs4_deviceid *);
 
 #endif /* FS_NFS_NFS4FILELAYOUT_H */
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index 5914659..f7f0bb8 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -553,6 +553,42 @@ nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
 	nfs4_put_deviceid_node(&dsaddr->id_node);
 }
 
+static struct nfs4_file_layout_dsaddr *
+nfs4_fl_unhash_deviceid(struct nfs4_deviceid *id)
+{
+	struct nfs4_file_layout_dsaddr *d;
+	struct hlist_node *n;
+	long hash = nfs4_fl_deviceid_hash(id);
+
+	dprintk("%s: hash %ld\n", __func__, hash);
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(d, n, &filelayout_deviceid_cache[hash], node)
+		if (!memcmp(&d->deviceid, id, sizeof(*id)))
+			goto found;
+	rcu_read_unlock();
+	return NULL;
+
+found:
+	rcu_read_unlock();
+	spin_lock(&filelayout_deviceid_lock);
+	hlist_del_init_rcu(&d->node);
+	spin_unlock(&filelayout_deviceid_lock);
+	synchronize_rcu();
+
+	return d;
+}
+
+void
+filelayout_delete_deviceid(struct nfs4_deviceid *id)
+{
+	struct nfs4_file_layout_dsaddr *d;
+
+	d = nfs4_fl_unhash_deviceid(id);
+	/* balance the initial ref taken in decode_and_add_device */
+	if (d && atomic_dec_and_test(&d->ref))
+		nfs4_fl_free_deviceid(d);
+}
+
 /*
  * Want res = (offset - layout->pattern_offset)/ layout->stripe_unit
  * Then: ((res + fsi) % dsaddr->stripe_count)
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 9667a62..fa19c97 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -94,6 +94,7 @@ struct pnfs_layoutdriver_type {
 	enum pnfs_try_status (*write_pagelist) (struct nfs_write_data *nfs_data, int how);
 
 	void (*free_deviceid_node) (struct nfs4_deviceid_node *);
+	void (*delete_deviceid)(struct nfs4_deviceid *);
 };
 
 struct pnfs_layout_hdr {
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 06/38] SQUASHME: use be32 res in nfs4_callback_devicenotify
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (4 preceding siblings ...)
  2011-05-22 23:46 ` [PATCH v5 05/38] pnfs: CB_NOTIFY_DEVICEID Benny Halevy
@ 2011-05-22 23:46 ` Benny Halevy
  2011-05-22 23:47 ` [PATCH v5 07/38] SQUASHME: pnfs: use nfs_client to qualify deviceid for cb_notify_deviceid Benny Halevy
                   ` (32 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:46 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/callback_proc.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 975c8f2..95107fe 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -245,14 +245,14 @@ __be32 nfs4_callback_devicenotify(struct cb_devicenotifyargs *args,
 				  void *dummy, struct cb_process_state *cps)
 {
 	int i;
-	u32 res = 0;
+	__be32 res = 0;
 	struct nfs_client *clp = cps->clp;
 	struct nfs_server *server = NULL;
 
 	dprintk("%s: -->\n", __func__);
 
 	if (!clp) {
-		res = NFS4ERR_OP_NOT_IN_SESSION;
+		res = cpu_to_be32(NFS4ERR_OP_NOT_IN_SESSION);
 		goto out;
 	}
 
@@ -276,7 +276,7 @@ __be32 nfs4_callback_devicenotify(struct cb_devicenotifyargs *args,
 
 	found:
 		if (!server->pnfs_curr_ld->delete_deviceid) {
-			res = NFS4ERR_NOTSUPP;
+			res = cpu_to_be32(NFS4ERR_NOTSUPP);
 			break;
 		}
 		if (dev->cbd_notify_type == NOTIFY_DEVICEID4_CHANGE)
@@ -288,8 +288,8 @@ __be32 nfs4_callback_devicenotify(struct cb_devicenotifyargs *args,
 out:
 	kfree(args->devs);
 	dprintk("%s: exit with status = %u\n",
-		__func__, res);
-	return cpu_to_be32(res);
+		__func__, be32_to_cpu(res));
+	return res;
 }
 
 int nfs41_validate_delegation_stateid(struct nfs_delegation *delegation, const nfs4_stateid *stateid)
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 07/38] SQUASHME: pnfs: use nfs_client to qualify deviceid for cb_notify_deviceid
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (5 preceding siblings ...)
  2011-05-22 23:46 ` [PATCH v5 06/38] SQUASHME: use be32 res in nfs4_callback_devicenotify Benny Halevy
@ 2011-05-22 23:47 ` Benny Halevy
  2011-05-22 23:47 ` [PATCH v5 08/38] SQUASHME: pnfs: use global deviceid cache for CB_NOTIFY_DEVICEID Benny Halevy
                   ` (31 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:47 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/callback_proc.c     |    2 +-
 fs/nfs/nfs4filelayout.h    |    2 +-
 fs/nfs/nfs4filelayoutdev.c |    8 ++++----
 fs/nfs/pnfs.h              |    2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 95107fe..fc96057 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -282,7 +282,7 @@ __be32 nfs4_callback_devicenotify(struct cb_devicenotifyargs *args,
 		if (dev->cbd_notify_type == NOTIFY_DEVICEID4_CHANGE)
 			dprintk("%s: NOTIFY_DEVICEID4_CHANGE not supported, "
 				"deleting instead\n", __func__);
-		server->pnfs_curr_ld->delete_deviceid(&dev->cbd_dev_id);
+		server->pnfs_curr_ld->delete_deviceid(clp, &dev->cbd_dev_id);
 	}
 
 out:
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 8601656..4289dbf 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -101,6 +101,6 @@ extern void nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
 extern void nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
 struct nfs4_file_layout_dsaddr *
 get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id, gfp_t gfp_flags);
-void filelayout_delete_deviceid(struct nfs4_deviceid *);
+void filelayout_delete_deviceid(const struct nfs_client *, const struct nfs4_deviceid *);
 
 #endif /* FS_NFS_NFS4FILELAYOUT_H */
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index f7f0bb8..f23a7f4 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -554,7 +554,7 @@ nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
 }
 
 static struct nfs4_file_layout_dsaddr *
-nfs4_fl_unhash_deviceid(struct nfs4_deviceid *id)
+nfs4_fl_unhash_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id)
 {
 	struct nfs4_file_layout_dsaddr *d;
 	struct hlist_node *n;
@@ -563,7 +563,7 @@ nfs4_fl_unhash_deviceid(struct nfs4_deviceid *id)
 	dprintk("%s: hash %ld\n", __func__, hash);
 	rcu_read_lock();
 	hlist_for_each_entry_rcu(d, n, &filelayout_deviceid_cache[hash], node)
-		if (!memcmp(&d->deviceid, id, sizeof(*id)))
+		if (d->nfs_client == clp && !memcmp(&d->deviceid, id, sizeof(*id)))
 			goto found;
 	rcu_read_unlock();
 	return NULL;
@@ -579,11 +579,11 @@ found:
 }
 
 void
-filelayout_delete_deviceid(struct nfs4_deviceid *id)
+filelayout_delete_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id)
 {
 	struct nfs4_file_layout_dsaddr *d;
 
-	d = nfs4_fl_unhash_deviceid(id);
+	d = nfs4_fl_unhash_deviceid(clp, id);
 	/* balance the initial ref taken in decode_and_add_device */
 	if (d && atomic_dec_and_test(&d->ref))
 		nfs4_fl_free_deviceid(d);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index fa19c97..64ebd76 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -94,7 +94,7 @@ struct pnfs_layoutdriver_type {
 	enum pnfs_try_status (*write_pagelist) (struct nfs_write_data *nfs_data, int how);
 
 	void (*free_deviceid_node) (struct nfs4_deviceid_node *);
-	void (*delete_deviceid)(struct nfs4_deviceid *);
+	void (*delete_deviceid)(const struct nfs_client *, const struct nfs4_deviceid *);
 };
 
 struct pnfs_layout_hdr {
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 08/38] SQUASHME: pnfs: use global deviceid cache for CB_NOTIFY_DEVICEID
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (6 preceding siblings ...)
  2011-05-22 23:47 ` [PATCH v5 07/38] SQUASHME: pnfs: use nfs_client to qualify deviceid for cb_notify_deviceid Benny Halevy
@ 2011-05-22 23:47 ` Benny Halevy
  2011-05-22 23:48 ` [PATCH v5 09/38] SQUASHME: pnfs: refactor device cache _lookup_deviceid Benny Halevy
                   ` (30 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:47 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/callback_proc.c     |    6 +----
 fs/nfs/nfs4filelayout.c    |    1 -
 fs/nfs/nfs4filelayout.h    |    1 -
 fs/nfs/nfs4filelayoutdev.c |   36 ---------------------------
 fs/nfs/pnfs.h              |    2 +-
 fs/nfs/pnfs_dev.c          |   58 ++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 60 insertions(+), 44 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index fc96057..fb5e5b9 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -275,14 +275,10 @@ __be32 nfs4_callback_devicenotify(struct cb_devicenotifyargs *args,
 		}
 
 	found:
-		if (!server->pnfs_curr_ld->delete_deviceid) {
-			res = cpu_to_be32(NFS4ERR_NOTSUPP);
-			break;
-		}
 		if (dev->cbd_notify_type == NOTIFY_DEVICEID4_CHANGE)
 			dprintk("%s: NOTIFY_DEVICEID4_CHANGE not supported, "
 				"deleting instead\n", __func__);
-		server->pnfs_curr_ld->delete_deviceid(clp, &dev->cbd_dev_id);
+		nfs4_delete_deviceid(clp, &dev->cbd_dev_id);
 	}
 
 out:
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 43284d9..c181a8b 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -881,7 +881,6 @@ static struct pnfs_layoutdriver_type filelayout_type = {
 	.read_pagelist		= filelayout_read_pagelist,
 	.write_pagelist		= filelayout_write_pagelist,
 	.free_deviceid_node	= filelayout_free_deveiceid_node,
-	.delete_deviceid	= filelayout_delete_deviceid,
 };
 
 static int __init nfs4filelayout_init(void)
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index 4289dbf..cebe01e 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -101,6 +101,5 @@ extern void nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
 extern void nfs4_fl_free_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
 struct nfs4_file_layout_dsaddr *
 get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id, gfp_t gfp_flags);
-void filelayout_delete_deviceid(const struct nfs_client *, const struct nfs4_deviceid *);
 
 #endif /* FS_NFS_NFS4FILELAYOUT_H */
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index f23a7f4..5914659 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -553,42 +553,6 @@ nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr)
 	nfs4_put_deviceid_node(&dsaddr->id_node);
 }
 
-static struct nfs4_file_layout_dsaddr *
-nfs4_fl_unhash_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id)
-{
-	struct nfs4_file_layout_dsaddr *d;
-	struct hlist_node *n;
-	long hash = nfs4_fl_deviceid_hash(id);
-
-	dprintk("%s: hash %ld\n", __func__, hash);
-	rcu_read_lock();
-	hlist_for_each_entry_rcu(d, n, &filelayout_deviceid_cache[hash], node)
-		if (d->nfs_client == clp && !memcmp(&d->deviceid, id, sizeof(*id)))
-			goto found;
-	rcu_read_unlock();
-	return NULL;
-
-found:
-	rcu_read_unlock();
-	spin_lock(&filelayout_deviceid_lock);
-	hlist_del_init_rcu(&d->node);
-	spin_unlock(&filelayout_deviceid_lock);
-	synchronize_rcu();
-
-	return d;
-}
-
-void
-filelayout_delete_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id)
-{
-	struct nfs4_file_layout_dsaddr *d;
-
-	d = nfs4_fl_unhash_deviceid(clp, id);
-	/* balance the initial ref taken in decode_and_add_device */
-	if (d && atomic_dec_and_test(&d->ref))
-		nfs4_fl_free_deviceid(d);
-}
-
 /*
  * Want res = (offset - layout->pattern_offset)/ layout->stripe_unit
  * Then: ((res + fsi) % dsaddr->stripe_count)
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 64ebd76..7f29e3a 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -94,7 +94,6 @@ struct pnfs_layoutdriver_type {
 	enum pnfs_try_status (*write_pagelist) (struct nfs_write_data *nfs_data, int how);
 
 	void (*free_deviceid_node) (struct nfs4_deviceid_node *);
-	void (*delete_deviceid)(const struct nfs_client *, const struct nfs4_deviceid *);
 };
 
 struct pnfs_layout_hdr {
@@ -174,6 +173,7 @@ struct nfs4_deviceid_node {
 void nfs4_print_deviceid(const struct nfs4_deviceid *dev_id);
 struct nfs4_deviceid_node *nfs4_find_get_deviceid(const struct nfs_client *, const struct nfs4_deviceid *);
 struct nfs4_deviceid_node *nfs4_unhash_put_deviceid(const struct nfs_client *, const struct nfs4_deviceid *);
+void nfs4_delete_deviceid(const struct nfs_client *, const struct nfs4_deviceid *);
 void nfs4_init_deviceid_node(struct nfs4_deviceid_node *,
 			     const struct pnfs_layoutdriver_type *,
 			     const struct nfs_client *,
diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
index 3cd7854..d205b18 100644
--- a/fs/nfs/pnfs_dev.c
+++ b/fs/nfs/pnfs_dev.c
@@ -94,6 +94,64 @@ fail:
 }
 EXPORT_SYMBOL_GPL(nfs4_find_get_deviceid);
 
+/*
+ * Unhash and put deviceid
+ *
+ * @clp nfs_client associated with deviceid
+ * @id the deviceid to unhash
+ *
+ * @ret the unhashed node, if found and dereferenced to zero, NULL otherwise.
+ */
+struct nfs4_deviceid_node *
+nfs4_unhash_put_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id)
+{
+	struct nfs4_deviceid_node *d;
+	struct hlist_node *n;
+	long hash = nfs4_deviceid_hash(id);
+
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(d, n, &nfs4_deviceid_cache[hash], node)
+		if (d->nfs_client == clp && !memcmp(&d->deviceid, id, sizeof(*id)))
+			goto found;
+	rcu_read_unlock();
+	return NULL;
+
+found:
+	rcu_read_unlock();
+	spin_lock(&nfs4_deviceid_lock);
+	hlist_del_init_rcu(&d->node);
+	spin_unlock(&nfs4_deviceid_lock);
+	synchronize_rcu();
+
+	/* balance the initial ref set in pnfs_insert_deviceid */
+	if (atomic_dec_and_test(&d->ref))
+		return d;
+
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(nfs4_unhash_put_deviceid);
+
+/*
+ * Delete a deviceid from cache
+ *
+ * @clp struct nfs_client qualifying the deviceid
+ * @id deviceid to delete
+ */
+void
+nfs4_delete_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id)
+{
+	struct nfs4_deviceid_node *d;
+
+	d = nfs4_unhash_put_deviceid(clp, id);
+	if (!d)
+		return;
+	if (d->ld->free_deviceid_node)
+		d->ld->free_deviceid_node(d);
+	else
+		kfree(d);
+}
+EXPORT_SYMBOL_GPL(nfs4_delete_deviceid);
+
 void
 nfs4_init_deviceid_node(struct nfs4_deviceid_node *d,
 			const struct pnfs_layoutdriver_type *ld,
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 09/38] SQUASHME: pnfs: refactor device cache _lookup_deviceid
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (7 preceding siblings ...)
  2011-05-22 23:47 ` [PATCH v5 08/38] SQUASHME: pnfs: use global deviceid cache for CB_NOTIFY_DEVICEID Benny Halevy
@ 2011-05-22 23:48 ` Benny Halevy
  2011-05-22 23:49 ` [PATCH v5 10/38] SQUASHME: pnfs: refactor device cache _find_get_deviceid Benny Halevy
                   ` (29 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:48 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/pnfs_dev.c |   49 +++++++++++++++++++++++++++----------------------
 1 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
index d205b18..374fa66 100644
--- a/fs/nfs/pnfs_dev.c
+++ b/fs/nfs/pnfs_dev.c
@@ -66,6 +66,23 @@ nfs4_deviceid_hash(const struct nfs4_deviceid *id)
 	return x & NFS4_DEVICE_ID_HASH_MASK;
 }
 
+static struct nfs4_deviceid_node *
+_lookup_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id,
+		 long hash)
+{
+	struct nfs4_deviceid_node *d;
+	struct hlist_node *n;
+
+	hlist_for_each_entry_rcu(d, n, &nfs4_deviceid_cache[hash], node)
+		if (d->nfs_client == clp && !memcmp(&d->deviceid, id, sizeof(*id))) {
+			if (atomic_read(&d->ref))
+				return d;
+			else
+				continue;
+		}
+	return NULL;
+}
+
 /*
  * Lookup a deviceid in cache and get a reference count on it if found
  *
@@ -76,21 +93,13 @@ struct nfs4_deviceid_node *
 nfs4_find_get_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id)
 {
 	struct nfs4_deviceid_node *d;
-	struct hlist_node *n;
-	long hash = nfs4_deviceid_hash(id);
 
 	rcu_read_lock();
-	hlist_for_each_entry_rcu(d, n, &nfs4_deviceid_cache[hash], node) {
-		if (d->nfs_client == clp && !memcmp(&d->deviceid, id, sizeof(*id))) {
-			if (!atomic_inc_not_zero(&d->ref))
-				goto fail;
-			rcu_read_unlock();
-			return d;
-		}
-	}
-fail:
+	d = _lookup_deviceid(clp, id, nfs4_deviceid_hash(id));
+	if (!atomic_inc_not_zero(&d->ref))
+		d = NULL;
 	rcu_read_unlock();
-	return NULL;
+	return d;
 }
 EXPORT_SYMBOL_GPL(nfs4_find_get_deviceid);
 
@@ -106,19 +115,15 @@ struct nfs4_deviceid_node *
 nfs4_unhash_put_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id)
 {
 	struct nfs4_deviceid_node *d;
-	struct hlist_node *n;
-	long hash = nfs4_deviceid_hash(id);
 
+	spin_lock(&nfs4_deviceid_lock);
 	rcu_read_lock();
-	hlist_for_each_entry_rcu(d, n, &nfs4_deviceid_cache[hash], node)
-		if (d->nfs_client == clp && !memcmp(&d->deviceid, id, sizeof(*id)))
-			goto found;
+	d = _lookup_deviceid(clp, id, nfs4_deviceid_hash(id));
 	rcu_read_unlock();
-	return NULL;
-
-found:
-	rcu_read_unlock();
-	spin_lock(&nfs4_deviceid_lock);
+	if (!d) {
+		spin_unlock(&nfs4_deviceid_lock);
+		return NULL;
+	}
 	hlist_del_init_rcu(&d->node);
 	spin_unlock(&nfs4_deviceid_lock);
 	synchronize_rcu();
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 10/38] SQUASHME: pnfs: refactor device cache _find_get_deviceid
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (8 preceding siblings ...)
  2011-05-22 23:48 ` [PATCH v5 09/38] SQUASHME: pnfs: refactor device cache _lookup_deviceid Benny Halevy
@ 2011-05-22 23:49 ` Benny Halevy
  2011-05-22 23:50 ` [PATCH v5 11/38] SUNRPC: introduce xdr_init_decode_pages Benny Halevy
                   ` (28 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:49 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/pnfs_dev.c |   15 +++++++++++----
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
index 374fa66..f830616 100644
--- a/fs/nfs/pnfs_dev.c
+++ b/fs/nfs/pnfs_dev.c
@@ -90,17 +90,24 @@ _lookup_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id,
  * @id deviceid to look up
  */
 struct nfs4_deviceid_node *
-nfs4_find_get_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id)
+_find_get_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id,
+		   long hash)
 {
 	struct nfs4_deviceid_node *d;
 
 	rcu_read_lock();
-	d = _lookup_deviceid(clp, id, nfs4_deviceid_hash(id));
+	d = _lookup_deviceid(clp, id, hash);
 	if (!atomic_inc_not_zero(&d->ref))
 		d = NULL;
 	rcu_read_unlock();
 	return d;
 }
+
+struct nfs4_deviceid_node *
+nfs4_find_get_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id)
+{
+	return _find_get_deviceid(clp, id, nfs4_deviceid_hash(id));
+}
 EXPORT_SYMBOL_GPL(nfs4_find_get_deviceid);
 
 /*
@@ -189,13 +196,13 @@ nfs4_insert_deviceid_node(struct nfs4_deviceid_node *new)
 	long hash;
 
 	spin_lock(&nfs4_deviceid_lock);
-	d = nfs4_find_get_deviceid(new->nfs_client, &new->deviceid);
+	hash = nfs4_deviceid_hash(&new->deviceid);
+	d = _find_get_deviceid(new->nfs_client, &new->deviceid, hash);
 	if (d) {
 		spin_unlock(&nfs4_deviceid_lock);
 		return d;
 	}
 
-	hash = nfs4_deviceid_hash(&new->deviceid);
 	hlist_add_head_rcu(&new->node, &nfs4_deviceid_cache[hash]);
 	spin_unlock(&nfs4_deviceid_lock);
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 11/38] SUNRPC: introduce xdr_init_decode_pages
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (9 preceding siblings ...)
  2011-05-22 23:49 ` [PATCH v5 10/38] SQUASHME: pnfs: refactor device cache _find_get_deviceid Benny Halevy
@ 2011-05-22 23:50 ` Benny Halevy
  2011-05-22 23:51 ` [PATCH v5 12/38] pnfs: Use byte-range for layoutget Benny Halevy
                   ` (27 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:50 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

Initialize xdr_stream and xdr_buf using an array of page pointers
and length of buffer.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/dir.c               |    9 ++-------
 fs/nfs/nfs4filelayout.c    |    9 ++-------
 fs/nfs/nfs4filelayoutdev.c |    9 ++-------
 include/linux/sunrpc/xdr.h |    2 ++
 net/sunrpc/xdr.c           |   19 +++++++++++++++++++
 5 files changed, 27 insertions(+), 21 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 7237672..f673a9e 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -512,12 +512,7 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
 				struct page **xdr_pages, struct page *page, unsigned int buflen)
 {
 	struct xdr_stream stream;
-	struct xdr_buf buf = {
-		.pages = xdr_pages,
-		.page_len = buflen,
-		.buflen = buflen,
-		.len = buflen,
-	};
+	struct xdr_buf buf;
 	struct page *scratch;
 	struct nfs_cache_array *array;
 	unsigned int count = 0;
@@ -527,7 +522,7 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
 	if (scratch == NULL)
 		return -ENOMEM;
 
-	xdr_init_decode(&stream, &buf, NULL);
+	xdr_init_decode_pages(&stream, &buf, xdr_pages, buflen);
 	xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE);
 
 	do {
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index c181a8b..5b3080d 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -509,12 +509,7 @@ filelayout_decode_layout(struct pnfs_layout_hdr *flo,
 			 gfp_t gfp_flags)
 {
 	struct xdr_stream stream;
-	struct xdr_buf buf = {
-		.pages =  lgr->layoutp->pages,
-		.page_len =  lgr->layoutp->len,
-		.buflen =  lgr->layoutp->len,
-		.len = lgr->layoutp->len,
-	};
+	struct xdr_buf buf;
 	struct page *scratch;
 	__be32 *p;
 	uint32_t nfl_util;
@@ -526,7 +521,7 @@ filelayout_decode_layout(struct pnfs_layout_hdr *flo,
 	if (!scratch)
 		return -ENOMEM;
 
-	xdr_init_decode(&stream, &buf, NULL);
+	xdr_init_decode_pages(&stream, &buf, lgr->layoutp->pages, lgr->layoutp->len);
 	xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE);
 
 	/* 20 = ufl_util (4), first_stripe_index (4), pattern_offset (8),
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index 5914659..3b7bf13 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -308,12 +308,7 @@ decode_device(struct inode *ino, struct pnfs_device *pdev, gfp_t gfp_flags)
 	u8 max_stripe_index;
 	struct nfs4_file_layout_dsaddr *dsaddr = NULL;
 	struct xdr_stream stream;
-	struct xdr_buf buf = {
-		.pages = pdev->pages,
-		.page_len = pdev->pglen,
-		.buflen = pdev->pglen,
-		.len = pdev->pglen,
-	};
+	struct xdr_buf buf;
 	struct page *scratch;
 
 	/* set up xdr stream */
@@ -321,7 +316,7 @@ decode_device(struct inode *ino, struct pnfs_device *pdev, gfp_t gfp_flags)
 	if (!scratch)
 		goto out_err;
 
-	xdr_init_decode(&stream, &buf, NULL);
+	xdr_init_decode_pages(&stream, &buf, pdev->pages, pdev->pglen);
 	xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE);
 
 	/* Get the stripe count (number of stripe index) */
diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index fc84b7a..a20970e 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -216,6 +216,8 @@ extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes);
 extern void xdr_write_pages(struct xdr_stream *xdr, struct page **pages,
 		unsigned int base, unsigned int len);
 extern void xdr_init_decode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p);
+extern void xdr_init_decode_pages(struct xdr_stream *xdr, struct xdr_buf *buf,
+		struct page **pages, unsigned int len);
 extern void xdr_set_scratch_buffer(struct xdr_stream *xdr, void *buf, size_t buflen);
 extern __be32 *xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes);
 extern void xdr_read_pages(struct xdr_stream *xdr, unsigned int len);
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 679cd67..f008c14 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -638,6 +638,25 @@ void xdr_init_decode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p)
 }
 EXPORT_SYMBOL_GPL(xdr_init_decode);
 
+/**
+ * xdr_init_decode - Initialize an xdr_stream for decoding data.
+ * @xdr: pointer to xdr_stream struct
+ * @buf: pointer to XDR buffer from which to decode data
+ * @pages: list of pages to decode into
+ * @len: length in bytes of buffer in pages
+ */
+void xdr_init_decode_pages(struct xdr_stream *xdr, struct xdr_buf *buf,
+			   struct page **pages, unsigned int len)
+{
+	memset(buf, 0, sizeof(*buf));
+	buf->pages =  pages;
+	buf->page_len =  len;
+	buf->buflen =  len;
+	buf->len = len;
+	xdr_init_decode(xdr, buf, NULL);
+}
+EXPORT_SYMBOL_GPL(xdr_init_decode_pages);
+
 static __be32 * __xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes)
 {
 	__be32 *p = xdr->p;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 12/38] pnfs: Use byte-range for layoutget
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (10 preceding siblings ...)
  2011-05-22 23:50 ` [PATCH v5 11/38] SUNRPC: introduce xdr_init_decode_pages Benny Halevy
@ 2011-05-22 23:51 ` Benny Halevy
  2011-05-22 23:51 ` [PATCH v5 13/38] pnfs: align layoutget requests on page boundaries Benny Halevy
                   ` (26 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:51 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

Add offset and count parameters to pnfs_update_layout and use them to get
the layout in the pageio path.

Order cache layout segments in the following order:
* offset (ascending)
* length (descending)
* iomode (RW before READ)

Test byte range against the layout segment in use in pnfs_{read,write}_pg_test
so not to coalesce pages not using the same layout segment.

[fix lseg ordering]
[clean up pnfs_find_lseg lseg arg]
[remove unnecessary FIXME]
[fix ordering in pnfs_insert_layout]
[clean up pnfs_insert_layout]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/pnfs.c  |  165 ++++++++++++++++++++++++++++++++++++++++++-------------
 fs/nfs/pnfs.h  |    6 ++-
 fs/nfs/read.c  |    8 ++-
 fs/nfs/write.c |    8 ++-
 4 files changed, 142 insertions(+), 45 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index f57f528..c2f09e9 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -261,6 +261,65 @@ put_lseg(struct pnfs_layout_segment *lseg)
 }
 EXPORT_SYMBOL_GPL(put_lseg);
 
+static inline u64
+end_offset(u64 start, u64 len)
+{
+	u64 end;
+
+	end = start + len;
+	return end >= start ? end : NFS4_MAX_UINT64;
+}
+
+/* last octet in a range */
+static inline u64
+last_byte_offset(u64 start, u64 len)
+{
+	u64 end;
+
+	BUG_ON(!len);
+	end = start + len;
+	return end > start ? end - 1 : NFS4_MAX_UINT64;
+}
+
+/*
+ * is l2 fully contained in l1?
+ *   start1                             end1
+ *   [----------------------------------)
+ *           start2           end2
+ *           [----------------)
+ */
+static inline int
+lo_seg_contained(struct pnfs_layout_range *l1,
+		 struct pnfs_layout_range *l2)
+{
+	u64 start1 = l1->offset;
+	u64 end1 = end_offset(start1, l1->length);
+	u64 start2 = l2->offset;
+	u64 end2 = end_offset(start2, l2->length);
+
+	return (start1 <= start2) && (end1 >= end2);
+}
+
+/*
+ * is l1 and l2 intersecting?
+ *   start1                             end1
+ *   [----------------------------------)
+ *                              start2           end2
+ *                              [----------------)
+ */
+static inline int
+lo_seg_intersecting(struct pnfs_layout_range *l1,
+		    struct pnfs_layout_range *l2)
+{
+	u64 start1 = l1->offset;
+	u64 end1 = end_offset(start1, l1->length);
+	u64 start2 = l2->offset;
+	u64 end2 = end_offset(start2, l2->length);
+
+	return (end1 == NFS4_MAX_UINT64 || end1 > start2) &&
+	       (end2 == NFS4_MAX_UINT64 || end2 > start1);
+}
+
 static bool
 should_free_lseg(u32 lseg_iomode, u32 recall_iomode)
 {
@@ -467,7 +526,7 @@ pnfs_choose_layoutget_stateid(nfs4_stateid *dst, struct pnfs_layout_hdr *lo,
 static struct pnfs_layout_segment *
 send_layoutget(struct pnfs_layout_hdr *lo,
 	   struct nfs_open_context *ctx,
-	   u32 iomode,
+	   struct pnfs_layout_range *range,
 	   gfp_t gfp_flags)
 {
 	struct inode *ino = lo->plh_inode;
@@ -499,11 +558,11 @@ send_layoutget(struct pnfs_layout_hdr *lo,
 			goto out_err_free;
 	}
 
-	lgp->args.minlength = NFS4_MAX_UINT64;
+	lgp->args.minlength = PAGE_CACHE_SIZE;
+	if (lgp->args.minlength > range->length)
+		lgp->args.minlength = range->length;
 	lgp->args.maxcount = PNFS_LAYOUT_MAXSIZE;
-	lgp->args.range.iomode = iomode;
-	lgp->args.range.offset = 0;
-	lgp->args.range.length = NFS4_MAX_UINT64;
+	lgp->args.range = *range;
 	lgp->args.type = server->pnfs_curr_ld->id;
 	lgp->args.inode = ino;
 	lgp->args.ctx = get_nfs_open_context(ctx);
@@ -518,7 +577,7 @@ send_layoutget(struct pnfs_layout_hdr *lo,
 	nfs4_proc_layoutget(lgp);
 	if (!lseg) {
 		/* remember that LAYOUTGET failed and suspend trying */
-		set_bit(lo_fail_bit(iomode), &lo->plh_flags);
+		set_bit(lo_fail_bit(range->iomode), &lo->plh_flags);
 	}
 
 	/* free xdr pages */
@@ -625,10 +684,23 @@ bool pnfs_roc_drain(struct inode *ino, u32 *barrier)
  * are seen first.
  */
 static s64
-cmp_layout(u32 iomode1, u32 iomode2)
+cmp_layout(struct pnfs_layout_range *l1,
+	   struct pnfs_layout_range *l2)
 {
+	s64 d;
+
+	/* high offset > low offset */
+	d = l1->offset - l2->offset;
+	if (d)
+		return d;
+
+	/* short length > long length */
+	d = l2->length - l1->length;
+	if (d)
+		return d;
+
 	/* read > read/write */
-	return (int)(iomode2 == IOMODE_READ) - (int)(iomode1 == IOMODE_READ);
+	return (int)(l1->iomode == IOMODE_READ) - (int)(l2->iomode == IOMODE_READ);
 }
 
 static void
@@ -636,13 +708,12 @@ pnfs_insert_layout(struct pnfs_layout_hdr *lo,
 		   struct pnfs_layout_segment *lseg)
 {
 	struct pnfs_layout_segment *lp;
-	int found = 0;
 
 	dprintk("%s:Begin\n", __func__);
 
 	assert_spin_locked(&lo->plh_inode->i_lock);
 	list_for_each_entry(lp, &lo->plh_segs, pls_list) {
-		if (cmp_layout(lp->pls_range.iomode, lseg->pls_range.iomode) > 0)
+		if (cmp_layout(&lseg->pls_range, &lp->pls_range) > 0)
 			continue;
 		list_add_tail(&lseg->pls_list, &lp->pls_list);
 		dprintk("%s: inserted lseg %p "
@@ -652,16 +723,14 @@ pnfs_insert_layout(struct pnfs_layout_hdr *lo,
 			lseg->pls_range.offset, lseg->pls_range.length,
 			lp, lp->pls_range.iomode, lp->pls_range.offset,
 			lp->pls_range.length);
-		found = 1;
-		break;
-	}
-	if (!found) {
-		list_add_tail(&lseg->pls_list, &lo->plh_segs);
-		dprintk("%s: inserted lseg %p "
-			"iomode %d offset %llu length %llu at tail\n",
-			__func__, lseg, lseg->pls_range.iomode,
-			lseg->pls_range.offset, lseg->pls_range.length);
+		goto out;
 	}
+	list_add_tail(&lseg->pls_list, &lo->plh_segs);
+	dprintk("%s: inserted lseg %p "
+		"iomode %d offset %llu length %llu at tail\n",
+		__func__, lseg, lseg->pls_range.iomode,
+		lseg->pls_range.offset, lseg->pls_range.length);
+out:
 	get_layout_hdr(lo);
 
 	dprintk("%s:Return\n", __func__);
@@ -721,16 +790,28 @@ pnfs_find_alloc_layout(struct inode *ino, gfp_t gfp_flags)
  * READ		RW	true
  */
 static int
-is_matching_lseg(struct pnfs_layout_segment *lseg, u32 iomode)
+is_matching_lseg(struct pnfs_layout_range *ls_range,
+		 struct pnfs_layout_range *range)
 {
-	return (iomode != IOMODE_RW || lseg->pls_range.iomode == IOMODE_RW);
+	struct pnfs_layout_range range1;
+
+	if ((range->iomode == IOMODE_RW &&
+	     ls_range->iomode != IOMODE_RW) ||
+	    !lo_seg_intersecting(ls_range, range))
+		return 0;
+
+	/* range1 covers only the first byte in the range */
+	range1 = *range;
+	range1.length = 1;
+	return lo_seg_contained(ls_range, &range1);
 }
 
 /*
  * lookup range in layout
  */
 static struct pnfs_layout_segment *
-pnfs_find_lseg(struct pnfs_layout_hdr *lo, u32 iomode)
+pnfs_find_lseg(struct pnfs_layout_hdr *lo,
+		struct pnfs_layout_range *range)
 {
 	struct pnfs_layout_segment *lseg, *ret = NULL;
 
@@ -739,11 +820,11 @@ pnfs_find_lseg(struct pnfs_layout_hdr *lo, u32 iomode)
 	assert_spin_locked(&lo->plh_inode->i_lock);
 	list_for_each_entry(lseg, &lo->plh_segs, pls_list) {
 		if (test_bit(NFS_LSEG_VALID, &lseg->pls_flags) &&
-		    is_matching_lseg(lseg, iomode)) {
+		    is_matching_lseg(&lseg->pls_range, range)) {
 			ret = get_lseg(lseg);
 			break;
 		}
-		if (cmp_layout(iomode, lseg->pls_range.iomode) > 0)
+		if (cmp_layout(range, &lseg->pls_range) > 0)
 			break;
 	}
 
@@ -759,9 +840,16 @@ pnfs_find_lseg(struct pnfs_layout_hdr *lo, u32 iomode)
 struct pnfs_layout_segment *
 pnfs_update_layout(struct inode *ino,
 		   struct nfs_open_context *ctx,
+		   loff_t pos,
+		   u64 count,
 		   enum pnfs_iomode iomode,
 		   gfp_t gfp_flags)
 {
+	struct pnfs_layout_range arg = {
+		.iomode = iomode,
+		.offset = pos,
+		.length = count,
+	};
 	struct nfs_inode *nfsi = NFS_I(ino);
 	struct nfs_client *clp = NFS_SERVER(ino)->nfs_client;
 	struct pnfs_layout_hdr *lo;
@@ -789,7 +877,7 @@ pnfs_update_layout(struct inode *ino,
 		goto out_unlock;
 
 	/* Check to see if the layout for the given range already exists */
-	lseg = pnfs_find_lseg(lo, iomode);
+	lseg = pnfs_find_lseg(lo, &arg);
 	if (lseg)
 		goto out_unlock;
 
@@ -811,7 +899,7 @@ pnfs_update_layout(struct inode *ino,
 		spin_unlock(&clp->cl_lock);
 	}
 
-	lseg = send_layoutget(lo, ctx, iomode, gfp_flags);
+	lseg = send_layoutget(lo, ctx, &arg, gfp_flags);
 	if (!lseg && first) {
 		spin_lock(&clp->cl_lock);
 		list_del_init(&lo->plh_layouts);
@@ -838,17 +926,6 @@ pnfs_layout_process(struct nfs4_layoutget *lgp)
 	struct nfs_client *clp = NFS_SERVER(ino)->nfs_client;
 	int status = 0;
 
-	/* Verify we got what we asked for.
-	 * Note that because the xdr parsing only accepts a single
-	 * element array, this can fail even if the server is behaving
-	 * correctly.
-	 */
-	if (lgp->args.range.iomode > res->range.iomode ||
-	    res->range.offset != 0 ||
-	    res->range.length != NFS4_MAX_UINT64) {
-		status = -EINVAL;
-		goto out;
-	}
 	/* Inject layout blob into I/O device driver */
 	lseg = NFS_SERVER(ino)->pnfs_curr_ld->alloc_lseg(lo, res, lgp->gfp_flags);
 	if (!lseg || IS_ERR(lseg)) {
@@ -903,9 +980,14 @@ static int pnfs_read_pg_test(struct nfs_pageio_descriptor *pgio,
 		/* This is first coelesce call for a series of nfs_pages */
 		pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
 						   prev->wb_context,
+						   req_offset(req),
+						   pgio->pg_count,
 						   IOMODE_READ,
 						   GFP_KERNEL);
-	}
+	} else if (pgio->pg_lseg &&
+		   req_offset(req) > end_offset(pgio->pg_lseg->pls_range.offset,
+						pgio->pg_lseg->pls_range.length))
+		return 0;
 	return NFS_SERVER(pgio->pg_inode)->pnfs_curr_ld->pg_test(pgio, prev, req);
 }
 
@@ -926,9 +1008,14 @@ static int pnfs_write_pg_test(struct nfs_pageio_descriptor *pgio,
 		/* This is first coelesce call for a series of nfs_pages */
 		pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
 						   prev->wb_context,
+						   req_offset(req),
+						   pgio->pg_count,
 						   IOMODE_RW,
 						   GFP_NOFS);
-	}
+	} else if (pgio->pg_lseg &&
+		   req_offset(req) > end_offset(pgio->pg_lseg->pls_range.offset,
+						pgio->pg_lseg->pls_range.length))
+		return 0;
 	return NFS_SERVER(pgio->pg_inode)->pnfs_curr_ld->pg_test(pgio, prev, req);
 }
 
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 7f29e3a..f1fb823 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -131,7 +131,8 @@ void get_layout_hdr(struct pnfs_layout_hdr *lo);
 void put_lseg(struct pnfs_layout_segment *lseg);
 struct pnfs_layout_segment *
 pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
-		   enum pnfs_iomode access_type, gfp_t gfp_flags);
+		   loff_t pos, u64 count, enum pnfs_iomode access_type,
+		   gfp_t gfp_flags);
 void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
 void unset_pnfs_layoutdriver(struct nfs_server *);
 enum pnfs_try_status pnfs_try_to_write_data(struct nfs_write_data *,
@@ -271,7 +272,8 @@ static inline void put_lseg(struct pnfs_layout_segment *lseg)
 
 static inline struct pnfs_layout_segment *
 pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
-		   enum pnfs_iomode access_type, gfp_t gfp_flags)
+		   loff_t pos, u64 count, enum pnfs_iomode access_type,
+		   gfp_t gfp_flags)
 {
 	return NULL;
 }
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 2bcf0dc..540c8bc 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -288,7 +288,9 @@ static int nfs_pagein_multi(struct nfs_pageio_descriptor *desc)
 	atomic_set(&req->wb_complete, requests);
 
 	BUG_ON(desc->pg_lseg != NULL);
-	lseg = pnfs_update_layout(desc->pg_inode, req->wb_context, IOMODE_READ, GFP_KERNEL);
+	lseg = pnfs_update_layout(desc->pg_inode, req->wb_context,
+				  req_offset(req), desc->pg_count,
+				  IOMODE_READ, GFP_KERNEL);
 	ClearPageError(page);
 	offset = 0;
 	nbytes = desc->pg_count;
@@ -351,7 +353,9 @@ static int nfs_pagein_one(struct nfs_pageio_descriptor *desc)
 	}
 	req = nfs_list_entry(data->pages.next);
 	if ((!lseg) && list_is_singular(&data->pages))
-		lseg = pnfs_update_layout(desc->pg_inode, req->wb_context, IOMODE_READ, GFP_KERNEL);
+		lseg = pnfs_update_layout(desc->pg_inode, req->wb_context,
+					  req_offset(req), desc->pg_count,
+					  IOMODE_READ, GFP_KERNEL);
 
 	ret = nfs_read_rpcsetup(req, data, &nfs_read_full_ops, desc->pg_count,
 				0, lseg);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 49c715b..7edb72f 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -939,7 +939,9 @@ static int nfs_flush_multi(struct nfs_pageio_descriptor *desc)
 	atomic_set(&req->wb_complete, requests);
 
 	BUG_ON(desc->pg_lseg);
-	lseg = pnfs_update_layout(desc->pg_inode, req->wb_context, IOMODE_RW, GFP_NOFS);
+	lseg = pnfs_update_layout(desc->pg_inode, req->wb_context,
+				  req_offset(req), desc->pg_count,
+				  IOMODE_RW, GFP_NOFS);
 	ClearPageError(page);
 	offset = 0;
 	nbytes = desc->pg_count;
@@ -1013,7 +1015,9 @@ static int nfs_flush_one(struct nfs_pageio_descriptor *desc)
 	}
 	req = nfs_list_entry(data->pages.next);
 	if ((!lseg) && list_is_singular(&data->pages))
-		lseg = pnfs_update_layout(desc->pg_inode, req->wb_context, IOMODE_RW, GFP_NOFS);
+		lseg = pnfs_update_layout(desc->pg_inode, req->wb_context,
+					  req_offset(req), desc->pg_count,
+					  IOMODE_RW, GFP_NOFS);
 
 	if ((desc->pg_ioflags & FLUSH_COND_STABLE) &&
 	    (desc->pg_moreio || NFS_I(desc->pg_inode)->ncommit))
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 13/38] pnfs: align layoutget requests on page boundaries
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (11 preceding siblings ...)
  2011-05-22 23:51 ` [PATCH v5 12/38] pnfs: Use byte-range for layoutget Benny Halevy
@ 2011-05-22 23:51 ` Benny Halevy
  2011-05-22 23:52 ` [PATCH v5 14/38] pnfs: Use byte-range for cb_layoutrecall Benny Halevy
                   ` (25 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:51 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/pnfs.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index c2f09e9..2357ee3 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -850,6 +850,7 @@ pnfs_update_layout(struct inode *ino,
 		.offset = pos,
 		.length = count,
 	};
+	unsigned pg_offset;
 	struct nfs_inode *nfsi = NFS_I(ino);
 	struct nfs_client *clp = NFS_SERVER(ino)->nfs_client;
 	struct pnfs_layout_hdr *lo;
@@ -899,6 +900,13 @@ pnfs_update_layout(struct inode *ino,
 		spin_unlock(&clp->cl_lock);
 	}
 
+	pg_offset = arg.offset & ~PAGE_CACHE_MASK;
+	if (pg_offset) {
+		arg.offset -= pg_offset;
+		arg.length += pg_offset;
+	}
+	arg.length = PAGE_CACHE_ALIGN(arg.length);
+
 	lseg = send_layoutget(lo, ctx, &arg, gfp_flags);
 	if (!lseg && first) {
 		spin_lock(&clp->cl_lock);
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 14/38] pnfs: Use byte-range for cb_layoutrecall
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (12 preceding siblings ...)
  2011-05-22 23:51 ` [PATCH v5 13/38] pnfs: align layoutget requests on page boundaries Benny Halevy
@ 2011-05-22 23:52 ` Benny Halevy
  2011-05-22 23:53 ` [PATCH v5 15/38] pnfs: client stats Benny Halevy
                   ` (24 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:52 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

Use recalled range to invalidate particular layout segments in the layout cache.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/callback_proc.c |    4 ++--
 fs/nfs/pnfs.c          |   15 +++++++++------
 fs/nfs/pnfs.h          |    2 +-
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index fb5e5b9..5780c37 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -139,7 +139,7 @@ static u32 initiate_file_draining(struct nfs_client *clp,
 	spin_lock(&ino->i_lock);
 	if (test_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags) ||
 	    mark_matching_lsegs_invalid(lo, &free_me_list,
-					args->cbl_range.iomode))
+					&args->cbl_range))
 		rv = NFS4ERR_DELAY;
 	else
 		rv = NFS4ERR_NOMATCHING_LAYOUT;
@@ -184,7 +184,7 @@ static u32 initiate_bulk_draining(struct nfs_client *clp,
 		ino = lo->plh_inode;
 		spin_lock(&ino->i_lock);
 		set_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags);
-		if (mark_matching_lsegs_invalid(lo, &free_me_list, range.iomode))
+		if (mark_matching_lsegs_invalid(lo, &free_me_list, &range))
 			rv = NFS4ERR_DELAY;
 		list_del_init(&lo->plh_bulk_recall);
 		spin_unlock(&ino->i_lock);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 2357ee3..20436a5 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -321,10 +321,12 @@ lo_seg_intersecting(struct pnfs_layout_range *l1,
 }
 
 static bool
-should_free_lseg(u32 lseg_iomode, u32 recall_iomode)
+should_free_lseg(struct pnfs_layout_range *lseg_range,
+		 struct pnfs_layout_range *recall_range)
 {
-	return (recall_iomode == IOMODE_ANY ||
-		lseg_iomode == recall_iomode);
+	return (recall_range->iomode == IOMODE_ANY ||
+		lseg_range->iomode == recall_range->iomode) &&
+	       lo_seg_intersecting(lseg_range, recall_range);
 }
 
 /* Returns 1 if lseg is removed from list, 0 otherwise */
@@ -355,7 +357,7 @@ static int mark_lseg_invalid(struct pnfs_layout_segment *lseg,
 int
 mark_matching_lsegs_invalid(struct pnfs_layout_hdr *lo,
 			    struct list_head *tmp_list,
-			    u32 iomode)
+			    struct pnfs_layout_range *recall_range)
 {
 	struct pnfs_layout_segment *lseg, *next;
 	int invalid = 0, removed = 0;
@@ -368,7 +370,8 @@ mark_matching_lsegs_invalid(struct pnfs_layout_hdr *lo,
 		return 0;
 	}
 	list_for_each_entry_safe(lseg, next, &lo->plh_segs, pls_list)
-		if (should_free_lseg(lseg->pls_range.iomode, iomode)) {
+		if (!recall_range ||
+		    should_free_lseg(&lseg->pls_range, recall_range)) {
 			dprintk("%s: freeing lseg %p iomode %d "
 				"offset %llu length %llu\n", __func__,
 				lseg, lseg->pls_range.iomode, lseg->pls_range.offset,
@@ -417,7 +420,7 @@ pnfs_destroy_layout(struct nfs_inode *nfsi)
 	lo = nfsi->layout;
 	if (lo) {
 		lo->plh_block_lgets++; /* permanently block new LAYOUTGETs */
-		mark_matching_lsegs_invalid(lo, &tmp_list, IOMODE_ANY);
+		mark_matching_lsegs_invalid(lo, &tmp_list, NULL);
 	}
 	spin_unlock(&nfsi->vfs_inode.i_lock);
 	pnfs_free_lseg_list(&tmp_list);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index f1fb823..7417be9 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -154,7 +154,7 @@ int pnfs_choose_layoutget_stateid(nfs4_stateid *dst,
 				  struct nfs4_state *open_state);
 int mark_matching_lsegs_invalid(struct pnfs_layout_hdr *lo,
 				struct list_head *tmp_list,
-				u32 iomode);
+				struct pnfs_layout_range *recall_range);
 bool pnfs_roc(struct inode *ino);
 void pnfs_roc_release(struct inode *ino);
 void pnfs_roc_set_barrier(struct inode *ino, u32 barrier);
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 15/38] pnfs: client stats
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (13 preceding siblings ...)
  2011-05-22 23:52 ` [PATCH v5 14/38] pnfs: Use byte-range for cb_layoutrecall Benny Halevy
@ 2011-05-22 23:53 ` Benny Halevy
  2011-05-22 23:54 ` [PATCH v5 16/38] pnfs-obj: objlayoutdriver module skeleton Benny Halevy
                   ` (23 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:53 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

From: J. Bruce Fields <bfields@fieldses.org>

A pNFS client auto-negotiates a lot of features (minorversion level,
pNFS layout type, etc.).  This is convenient, but makes certain kinds of
failures hard for a user to detect.

For example, if the client falls back on 4.0, or falls back to MDS IO
because the user didn't connect to the right iscsi disks before
mounting, the only symptoms may be reduced performance, which may not be
noticed till long after the actual failure, and may be difficult for a
user to diagnose.

However, such "failures" may also be perfectly normal in some cases, so
we don't want to spam the system logs with them.

One approach would be to put some more information into
/proc/self/mountstats.

Signed-off-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfs: add commit client stats]
[fixup data types for "ret" variables in pnfs_try_to* inline funcs.]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[fix definition of show_pnfs for !CONFIG_PNFS]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Fix show_sessions in the not CONFIG_NFS_V4_1 case]
    There is a build error when CONFIG_NFS_V4 is set but
    CONFIG_NFS_V4_1 is *not* set. show_sessions() prototype
    was unbalanced between the two cases.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[pnfs: super.c remove CONFIG_PNFS]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/super.c |   25 +++++++++++++++++++++++++
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index e288f06..ce40e5c 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -63,6 +63,7 @@
 #include "iostat.h"
 #include "internal.h"
 #include "fscache.h"
+#include "pnfs.h"
 
 #define NFSDBG_FACILITY		NFSDBG_VFS
 
@@ -732,6 +733,28 @@ static int nfs_show_options(struct seq_file *m, struct vfsmount *mnt)
 
 	return 0;
 }
+#ifdef CONFIG_NFS_V4_1
+void show_sessions(struct seq_file *m, struct nfs_server *server)
+{
+	if (nfs4_has_session(server->nfs_client))
+		seq_printf(m, ",sessions");
+}
+#else
+void show_sessions(struct seq_file *m, struct nfs_server *server) {}
+#endif
+
+#ifdef CONFIG_NFS_V4_1
+void show_pnfs(struct seq_file *m, struct nfs_server *server)
+{
+	seq_printf(m, ",pnfs=");
+	if (server->pnfs_curr_ld)
+		seq_printf(m, "%s", server->pnfs_curr_ld->name);
+	else
+		seq_printf(m, "not configured");
+}
+#else  /* CONFIG_NFS_V4_1 */
+void show_pnfs(struct seq_file *m, struct nfs_server *server) {}
+#endif /* CONFIG_NFS_V4_1 */
 
 static int nfs_show_devname(struct seq_file *m, struct vfsmount *mnt)
 {
@@ -792,6 +815,8 @@ static int nfs_show_stats(struct seq_file *m, struct vfsmount *mnt)
 		seq_printf(m, "bm0=0x%x", nfss->attr_bitmask[0]);
 		seq_printf(m, ",bm1=0x%x", nfss->attr_bitmask[1]);
 		seq_printf(m, ",acl=0x%x", nfss->acl_bitmask);
+		show_sessions(m, nfss);
+		show_pnfs(m, nfss);
 	}
 #endif
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 16/38] pnfs-obj: objlayoutdriver module skeleton
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (14 preceding siblings ...)
  2011-05-22 23:53 ` [PATCH v5 15/38] pnfs: client stats Benny Halevy
@ 2011-05-22 23:54 ` Benny Halevy
  2011-05-22 23:55 ` [PATCH v5 17/38] pnfs-obj: pnfs_osd XDR definitions Benny Halevy
                   ` (22 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:54 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

* Define the PNFS_OBJLAYOUT Kconfig option in the nfs
  master Kconfig file.
* Add the objlayout driver to the Kernel's Kbuild system.
* Add the fs/nfs/objlayout/Kbuild file for building the
  objlayoutdriver.ko driver
* Define fs/nfs/objlayout/objio_osd.c, register the driver on module
  initialization and unregister on exit.

[pnfs-obj: remove of CONFIG_PNFS fallout]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[added "unsure" clause]
[depend on NFS_V4_1]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/Kconfig               |   10 +++++
 fs/nfs/Makefile              |    2 +
 fs/nfs/objlayout/Kbuild      |    5 +++
 fs/nfs/objlayout/objio_osd.c |   76 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 93 insertions(+), 0 deletions(-)
 create mode 100644 fs/nfs/objlayout/Kbuild
 create mode 100644 fs/nfs/objlayout/objio_osd.c

diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index ba30665..8151554 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -87,6 +87,16 @@ config NFS_V4_1
 config PNFS_FILE_LAYOUT
 	tristate
 
+config PNFS_OBJLAYOUT
+	tristate "Provide support for the pNFS Objects Layout Driver for NFSv4.1 pNFS (EXPERIMENTAL)"
+	depends on NFS_FS && NFS_V4_1 && SCSI_OSD_ULD
+	help
+	  Say M here if you want your pNFS client to support the Objects Layout Driver.
+	  Requires the SCSI osd initiator library (SCSI_OSD_INITIATOR) and
+	  upper level driver (SCSI_OSD_ULD).
+
+	  If unsure, say N.
+
 config ROOT_NFS
 	bool "Root file system on NFS"
 	depends on NFS_FS=y && IP_PNP
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 7516a8a..6a34f7d 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -21,3 +21,5 @@ nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-index.o
 
 obj-$(CONFIG_PNFS_FILE_LAYOUT) += nfs_layout_nfsv41_files.o
 nfs_layout_nfsv41_files-y := nfs4filelayout.o nfs4filelayoutdev.o
+
+obj-$(CONFIG_PNFS_OBJLAYOUT) += objlayout/
diff --git a/fs/nfs/objlayout/Kbuild b/fs/nfs/objlayout/Kbuild
new file mode 100644
index 0000000..2e5b9a4
--- /dev/null
+++ b/fs/nfs/objlayout/Kbuild
@@ -0,0 +1,5 @@
+#
+# Makefile for the pNFS Objects Layout Driver kernel module
+#
+objlayoutdriver-y := objio_osd.o
+obj-$(CONFIG_PNFS_OBJLAYOUT) += objlayoutdriver.o
diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
new file mode 100644
index 0000000..379595f
--- /dev/null
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -0,0 +1,76 @@
+/*
+ *  pNFS Objects layout implementation over open-osd initiator library
+ *
+ *  Copyright (C) 2009 Panasas Inc. [year of first publication]
+ *  All rights reserved.
+ *
+ *  Benny Halevy <bhalevy@panasas.com>
+ *  Boaz Harrosh <bharrosh@panasas.com>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2
+ *  See the file COPYING included with this distribution for more details.
+ *
+ *  Redistribution and use in source and binary forms, with or without
+ *  modification, are permitted provided that the following conditions
+ *  are met:
+ *
+ *  1. Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *  2. Redistributions in binary form must reproduce the above copyright
+ *     notice, this list of conditions and the following disclaimer in the
+ *     documentation and/or other materials provided with the distribution.
+ *  3. Neither the name of the Panasas company nor the names of its
+ *     contributors may be used to endorse or promote products derived
+ *     from this software without specific prior written permission.
+ *
+ *  THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ *  WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ *  MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ *  DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ *  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ *  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ *  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ *  BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ *  LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ *  NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ *  SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/module.h>
+#include "../pnfs.h"
+
+static struct pnfs_layoutdriver_type objlayout_type = {
+	.id = LAYOUT_OSD2_OBJECTS,
+	.name = "LAYOUT_OSD2_OBJECTS",
+};
+
+MODULE_DESCRIPTION("pNFS Layout Driver for OSD2 objects");
+MODULE_AUTHOR("Benny Halevy <bhalevy@panasas.com>");
+MODULE_LICENSE("GPL");
+
+static int __init
+objlayout_init(void)
+{
+	int ret = pnfs_register_layoutdriver(&objlayout_type);
+
+	if (ret)
+		printk(KERN_INFO
+			"%s: Registering OSD pNFS Layout Driver failed: error=%d\n",
+			__func__, ret);
+	else
+		printk(KERN_INFO "%s: Registered OSD pNFS Layout Driver\n",
+			__func__);
+	return ret;
+}
+
+static void __exit
+objlayout_exit(void)
+{
+	pnfs_unregister_layoutdriver(&objlayout_type);
+	printk(KERN_INFO "%s: Unregistered OSD pNFS Layout Driver\n",
+	       __func__);
+}
+
+module_init(objlayout_init);
+module_exit(objlayout_exit);
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 17/38] pnfs-obj: pnfs_osd XDR definitions
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (15 preceding siblings ...)
  2011-05-22 23:54 ` [PATCH v5 16/38] pnfs-obj: objlayoutdriver module skeleton Benny Halevy
@ 2011-05-22 23:55 ` Benny Halevy
  2011-05-22 23:56 ` [PATCH v5 18/38] pnfs-obj: pnfs_osd XDR client implementation Benny Halevy
                   ` (21 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:55 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

* Add the pnfs_osd_xdr.h header

* defintions the pnfs_osd_layout structure including all it's
  sub-types and constants.
* Declare the pnfs_osd_xdr_encode/decode_layout API + all needed
  inline helpers.

* Define the pnfs_osd_deviceaddr structure and all its subtypes and
  constants.
* Declare API for encoding/decoding of a pnfs_osd_deviceaddr to/from
  XDR stream.

* Define the pnfs_osd_ioerr structure, its substructures and constants.
* Declare API for encoding/decoding of a pnfs_osd_ioerr to/from
  XDR stream.

* Define the pnfs_osd_layoutupdate structure and its substructures.
* Declare API for encoding/decoding of a pnfs_osd_layoutupdate to/from
  XDR stream.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 include/linux/pnfs_osd_xdr.h |  350 ++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 350 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/pnfs_osd_xdr.h

diff --git a/include/linux/pnfs_osd_xdr.h b/include/linux/pnfs_osd_xdr.h
new file mode 100644
index 0000000..747d06d
--- /dev/null
+++ b/include/linux/pnfs_osd_xdr.h
@@ -0,0 +1,350 @@
+/*
+ *  pNFS-osd on-the-wire data structures
+ *
+ *  Copyright (C) 2007 Panasas Inc. [year of first publication]
+ *  All rights reserved.
+ *
+ *  Benny Halevy <bhalevy@panasas.com>
+ *  Boaz Harrosh <bharrosh@panasas.com>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2
+ *  See the file COPYING included with this distribution for more details.
+ *
+ *  Redistribution and use in source and binary forms, with or without
+ *  modification, are permitted provided that the following conditions
+ *  are met:
+ *
+ *  1. Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *  2. Redistributions in binary form must reproduce the above copyright
+ *     notice, this list of conditions and the following disclaimer in the
+ *     documentation and/or other materials provided with the distribution.
+ *  3. Neither the name of the Panasas company nor the names of its
+ *     contributors may be used to endorse or promote products derived
+ *     from this software without specific prior written permission.
+ *
+ *  THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ *  WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ *  MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ *  DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ *  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ *  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ *  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ *  BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ *  LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ *  NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ *  SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#ifndef __PNFS_OSD_XDR_H__
+#define __PNFS_OSD_XDR_H__
+
+#include <linux/nfs_fs.h>
+#include <linux/nfs_page.h>
+#include <scsi/osd_protocol.h>
+
+#define PNFS_OSD_OSDNAME_MAXSIZE 256
+
+/*
+ * draft-ietf-nfsv4-minorversion-22
+ * draft-ietf-nfsv4-pnfs-obj-12
+ */
+
+/* Layout Structure */
+
+enum pnfs_osd_raid_algorithm4 {
+	PNFS_OSD_RAID_0		= 1,
+	PNFS_OSD_RAID_4		= 2,
+	PNFS_OSD_RAID_5		= 3,
+	PNFS_OSD_RAID_PQ	= 4     /* Reed-Solomon P+Q */
+};
+
+/*   struct pnfs_osd_data_map4 {
+ *       uint32_t                    odm_num_comps;
+ *       length4                     odm_stripe_unit;
+ *       uint32_t                    odm_group_width;
+ *       uint32_t                    odm_group_depth;
+ *       uint32_t                    odm_mirror_cnt;
+ *       pnfs_osd_raid_algorithm4    odm_raid_algorithm;
+ *   };
+ */
+struct pnfs_osd_data_map {
+	u32	odm_num_comps;
+	u64	odm_stripe_unit;
+	u32	odm_group_width;
+	u32	odm_group_depth;
+	u32	odm_mirror_cnt;
+	u32	odm_raid_algorithm;
+};
+
+/*   struct pnfs_osd_objid4 {
+ *       deviceid4       oid_device_id;
+ *       uint64_t        oid_partition_id;
+ *       uint64_t        oid_object_id;
+ *   };
+ */
+struct pnfs_osd_objid {
+	struct nfs4_deviceid	oid_device_id;
+	u64			oid_partition_id;
+	u64			oid_object_id;
+};
+
+/* For printout. I use:
+ * kprint("dev(%llx:%llx)", _DEVID_LO(pointer), _DEVID_HI(pointer));
+ * BE style
+ */
+#define _DEVID_LO(oid_device_id) \
+	(unsigned long long)be64_to_cpup((__be64 *)(oid_device_id)->data)
+
+#define _DEVID_HI(oid_device_id) \
+	(unsigned long long)be64_to_cpup(((__be64 *)(oid_device_id)->data) + 1)
+
+static inline int
+pnfs_osd_objid_xdr_sz(void)
+{
+	return (NFS4_DEVICEID4_SIZE / 4) + 2 + 2;
+}
+
+enum pnfs_osd_version {
+	PNFS_OSD_MISSING              = 0,
+	PNFS_OSD_VERSION_1            = 1,
+	PNFS_OSD_VERSION_2            = 2
+};
+
+struct pnfs_osd_opaque_cred {
+	u32 cred_len;
+	void *cred;
+};
+
+enum pnfs_osd_cap_key_sec {
+	PNFS_OSD_CAP_KEY_SEC_NONE     = 0,
+	PNFS_OSD_CAP_KEY_SEC_SSV      = 1,
+};
+
+/*   struct pnfs_osd_object_cred4 {
+ *       pnfs_osd_objid4         oc_object_id;
+ *       pnfs_osd_version4       oc_osd_version;
+ *       pnfs_osd_cap_key_sec4   oc_cap_key_sec;
+ *       opaque                  oc_capability_key<>;
+ *       opaque                  oc_capability<>;
+ *   };
+ */
+struct pnfs_osd_object_cred {
+	struct pnfs_osd_objid		oc_object_id;
+	u32				oc_osd_version;
+	u32				oc_cap_key_sec;
+	struct pnfs_osd_opaque_cred	oc_cap_key;
+	struct pnfs_osd_opaque_cred	oc_cap;
+};
+
+/*   struct pnfs_osd_layout4 {
+ *       pnfs_osd_data_map4      olo_map;
+ *       uint32_t                olo_comps_index;
+ *       pnfs_osd_object_cred4   olo_components<>;
+ *   };
+ */
+struct pnfs_osd_layout {
+	struct pnfs_osd_data_map	olo_map;
+	u32				olo_comps_index;
+	u32				olo_num_comps;
+	struct pnfs_osd_object_cred	*olo_comps;
+};
+
+/* Device Address */
+enum pnfs_osd_targetid_type {
+	OBJ_TARGET_ANON = 1,
+	OBJ_TARGET_SCSI_NAME = 2,
+	OBJ_TARGET_SCSI_DEVICE_ID = 3,
+};
+
+/*   union pnfs_osd_targetid4 switch (pnfs_osd_targetid_type4 oti_type) {
+ *       case OBJ_TARGET_SCSI_NAME:
+ *           string              oti_scsi_name<>;
+ *
+ *       case OBJ_TARGET_SCSI_DEVICE_ID:
+ *           opaque              oti_scsi_device_id<>;
+ *
+ *       default:
+ *           void;
+ *   };
+ *
+ *   union pnfs_osd_targetaddr4 switch (bool ota_available) {
+ *       case TRUE:
+ *           netaddr4            ota_netaddr;
+ *       case FALSE:
+ *           void;
+ *   };
+ *
+ *   struct pnfs_osd_deviceaddr4 {
+ *       pnfs_osd_targetid4      oda_targetid;
+ *       pnfs_osd_targetaddr4    oda_targetaddr;
+ *       uint64_t                oda_lun;
+ *       opaque                  oda_systemid<>;
+ *       pnfs_osd_object_cred4   oda_root_obj_cred;
+ *       opaque                  oda_osdname<>;
+ *   };
+ */
+struct pnfs_osd_targetid {
+	u32				oti_type;
+	struct nfs4_string		oti_scsi_device_id;
+};
+
+enum { PNFS_OSD_TARGETID_MAX = 1 + PNFS_OSD_OSDNAME_MAXSIZE / 4 };
+
+/*   struct netaddr4 {
+ *       // see struct rpcb in RFC1833
+ *       string r_netid<>;    // network id
+ *       string r_addr<>;     // universal address
+ *   };
+ */
+struct pnfs_osd_net_addr {
+	struct nfs4_string	r_netid;
+	struct nfs4_string	r_addr;
+};
+
+struct pnfs_osd_targetaddr {
+	u32				ota_available;
+	struct pnfs_osd_net_addr	ota_netaddr;
+};
+
+enum {
+	NETWORK_ID_MAX = 16 / 4,
+	UNIVERSAL_ADDRESS_MAX = 64 / 4,
+	PNFS_OSD_TARGETADDR_MAX = 3 +  NETWORK_ID_MAX + UNIVERSAL_ADDRESS_MAX,
+};
+
+struct pnfs_osd_deviceaddr {
+	struct pnfs_osd_targetid	oda_targetid;
+	struct pnfs_osd_targetaddr	oda_targetaddr;
+	u8				oda_lun[8];
+	struct nfs4_string		oda_systemid;
+	struct pnfs_osd_object_cred	oda_root_obj_cred;
+	struct nfs4_string		oda_osdname;
+};
+
+enum {
+	ODA_OSDNAME_MAX = PNFS_OSD_OSDNAME_MAXSIZE / 4,
+	PNFS_OSD_DEVICEADDR_MAX =
+		PNFS_OSD_TARGETID_MAX + PNFS_OSD_TARGETADDR_MAX +
+		2 /*oda_lun*/ +
+		1 + OSD_SYSTEMID_LEN +
+		1 + ODA_OSDNAME_MAX,
+};
+
+/* LAYOUTCOMMIT: layoutupdate */
+
+/*   union pnfs_osd_deltaspaceused4 switch (bool dsu_valid) {
+ *       case TRUE:
+ *           int64_t     dsu_delta;
+ *       case FALSE:
+ *           void;
+ *   };
+ *
+ *   struct pnfs_osd_layoutupdate4 {
+ *       pnfs_osd_deltaspaceused4    olu_delta_space_used;
+ *       bool                        olu_ioerr_flag;
+ *   };
+ */
+struct pnfs_osd_layoutupdate {
+	u32	dsu_valid;
+	s64	dsu_delta;
+	u32	olu_ioerr_flag;
+};
+
+/* LAYOUTRETURN: I/O Rrror Report */
+
+enum pnfs_osd_errno {
+	PNFS_OSD_ERR_EIO		= 1,
+	PNFS_OSD_ERR_NOT_FOUND		= 2,
+	PNFS_OSD_ERR_NO_SPACE		= 3,
+	PNFS_OSD_ERR_BAD_CRED		= 4,
+	PNFS_OSD_ERR_NO_ACCESS		= 5,
+	PNFS_OSD_ERR_UNREACHABLE	= 6,
+	PNFS_OSD_ERR_RESOURCE		= 7
+};
+
+/*   struct pnfs_osd_ioerr4 {
+ *       pnfs_osd_objid4     oer_component;
+ *       length4             oer_comp_offset;
+ *       length4             oer_comp_length;
+ *       bool                oer_iswrite;
+ *       pnfs_osd_errno4     oer_errno;
+ *   };
+ */
+struct pnfs_osd_ioerr {
+	struct pnfs_osd_objid	oer_component;
+	u64			oer_comp_offset;
+	u64			oer_comp_length;
+	u32			oer_iswrite;
+	u32			oer_errno;
+};
+
+/* OSD XDR API */
+/* Layout helpers */
+/* Layout decoding is done in two parts:
+ * 1. First Call pnfs_osd_xdr_decode_layout_map to read in only the header part
+ *    of the layout. @iter members need not be initialized.
+ *    Returned:
+ *             @layout members are set. (@layout->olo_comps set to NULL).
+ *
+ *             Zero on success, or negative error if passed xdr is broken.
+ *
+ * 2. 2nd Call pnfs_osd_xdr_decode_layout_comp() in a loop until it returns
+ *    false, to decode the next component.
+ *    Returned:
+ *       true if there is more to decode or false if we are done or error.
+ *
+ * Example:
+ *	struct pnfs_osd_xdr_decode_layout_iter iter;
+ *	struct pnfs_osd_layout layout;
+ *	struct pnfs_osd_object_cred comp;
+ *	int status;
+ *
+ *	status = pnfs_osd_xdr_decode_layout_map(&layout, &iter, xdr);
+ *	if (unlikely(status))
+ *		goto err;
+ *	while(pnfs_osd_xdr_decode_layout_comp(&comp, &iter, xdr, &status)) {
+ *		// All of @comp strings point to inside the xdr_buffer
+ *		// or scrach buffer. Copy them out to user memory eg.
+ *		copy_single_comp(dest_comp++, &comp);
+ *	}
+ *	if (unlikely(status))
+ *		goto err;
+ */
+
+struct pnfs_osd_xdr_decode_layout_iter {
+	unsigned total_comps;
+	unsigned decoded_comps;
+};
+
+extern int pnfs_osd_xdr_decode_layout_map(struct pnfs_osd_layout *layout,
+	struct pnfs_osd_xdr_decode_layout_iter *iter, struct xdr_stream *xdr);
+
+extern bool pnfs_osd_xdr_decode_layout_comp(struct pnfs_osd_object_cred *comp,
+	struct pnfs_osd_xdr_decode_layout_iter *iter, struct xdr_stream *xdr,
+	int *err);
+
+/* Device Info helpers */
+
+/* Note: All strings inside @deviceaddr point to space inside @p.
+ * @p should stay valid while @deviceaddr is in use.
+ */
+extern void pnfs_osd_xdr_decode_deviceaddr(
+	struct pnfs_osd_deviceaddr *deviceaddr, __be32 *p);
+
+/* layoutupdate (layout_commit) xdr helpers */
+extern int
+pnfs_osd_xdr_encode_layoutupdate(struct xdr_stream *xdr,
+				 struct pnfs_osd_layoutupdate *lou);
+extern __be32 *
+pnfs_osd_xdr_decode_layoutupdate(struct pnfs_osd_layoutupdate *lou, __be32 *p);
+
+/* osd_ioerror encoding/decoding (layout_return) */
+/* Client */
+extern __be32 *pnfs_osd_xdr_ioerr_reserve_space(struct xdr_stream *xdr);
+extern void pnfs_osd_xdr_encode_ioerr(__be32 *p, struct pnfs_osd_ioerr *ioerr);
+/* Server*/
+extern __be32 *
+pnfs_osd_xdr_decode_ioerr(struct pnfs_osd_ioerr *ioerr, __be32 *p);
+
+#endif /* __PNFS_OSD_XDR_H__ */
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 18/38] pnfs-obj: pnfs_osd XDR client implementation
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (16 preceding siblings ...)
  2011-05-22 23:55 ` [PATCH v5 17/38] pnfs-obj: pnfs_osd XDR definitions Benny Halevy
@ 2011-05-22 23:56 ` Benny Halevy
  2011-05-22 23:57 ` [PATCH v5 19/38] pnfs-obj: decode layout, alloc/free lseg Benny Halevy
                   ` (20 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:56 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

From: Boaz Harrosh <bharrosh@panasas.com>

* Add the fs/nfs/objlayout/pnfs_osd_xdr_cli.c file, which will
  include the XDR encode/decode implementations for the pNFS
  client objlayout driver.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/objlayout/Kbuild             |    2 +-
 fs/nfs/objlayout/pnfs_osd_xdr_cli.c |  412 +++++++++++++++++++++++++++++++++++
 2 files changed, 413 insertions(+), 1 deletions(-)
 create mode 100644 fs/nfs/objlayout/pnfs_osd_xdr_cli.c

diff --git a/fs/nfs/objlayout/Kbuild b/fs/nfs/objlayout/Kbuild
index 2e5b9a4..7b2a5a2 100644
--- a/fs/nfs/objlayout/Kbuild
+++ b/fs/nfs/objlayout/Kbuild
@@ -1,5 +1,5 @@
 #
 # Makefile for the pNFS Objects Layout Driver kernel module
 #
-objlayoutdriver-y := objio_osd.o
+objlayoutdriver-y := objio_osd.o pnfs_osd_xdr_cli.o
 obj-$(CONFIG_PNFS_OBJLAYOUT) += objlayoutdriver.o
diff --git a/fs/nfs/objlayout/pnfs_osd_xdr_cli.c b/fs/nfs/objlayout/pnfs_osd_xdr_cli.c
new file mode 100644
index 0000000..dd3052c
--- /dev/null
+++ b/fs/nfs/objlayout/pnfs_osd_xdr_cli.c
@@ -0,0 +1,412 @@
+/*
+ *  Object-Based pNFS Layout XDR layer
+ *
+ *  Copyright (C) 2007 Panasas Inc. [year of first publication]
+ *  All rights reserved.
+ *
+ *  Benny Halevy <bhalevy@panasas.com>
+ *  Boaz Harrosh <bharrosh@panasas.com>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2
+ *  See the file COPYING included with this distribution for more details.
+ *
+ *  Redistribution and use in source and binary forms, with or without
+ *  modification, are permitted provided that the following conditions
+ *  are met:
+ *
+ *  1. Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *  2. Redistributions in binary form must reproduce the above copyright
+ *     notice, this list of conditions and the following disclaimer in the
+ *     documentation and/or other materials provided with the distribution.
+ *  3. Neither the name of the Panasas company nor the names of its
+ *     contributors may be used to endorse or promote products derived
+ *     from this software without specific prior written permission.
+ *
+ *  THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ *  WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ *  MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ *  DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ *  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ *  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ *  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ *  BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ *  LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ *  NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ *  SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/pnfs_osd_xdr.h>
+
+#define NFSDBG_FACILITY         NFSDBG_PNFS_LD
+
+/*
+ * The following implementation is based on RFC5664
+ */
+
+/*
+ * struct pnfs_osd_objid {
+ *	struct pnfs_deviceid	oid_device_id;
+ *	u64			oid_partition_id;
+ *	u64			oid_object_id;
+ * }; // xdr size 32 bytes
+ */
+static __be32 *
+_osd_xdr_decode_objid(__be32 *p, struct pnfs_osd_objid *objid)
+{
+	p = xdr_decode_opaque_fixed(p, objid->oid_device_id.data,
+				    sizeof(objid->oid_device_id.data));
+
+	p = xdr_decode_hyper(p, &objid->oid_partition_id);
+	p = xdr_decode_hyper(p, &objid->oid_object_id);
+	return p;
+}
+/*
+ * struct pnfs_osd_opaque_cred {
+ *	u32 cred_len;
+ *	void *cred;
+ * }; // xdr size [variable]
+ * The return pointers are from the xdr buffer
+ */
+static int
+_osd_xdr_decode_opaque_cred(struct pnfs_osd_opaque_cred *opaque_cred,
+			    struct xdr_stream *xdr)
+{
+	__be32 *p = xdr_inline_decode(xdr, 1);
+
+	if (!p)
+		return -EINVAL;
+
+	opaque_cred->cred_len = be32_to_cpu(*p++);
+
+	p = xdr_inline_decode(xdr, opaque_cred->cred_len);
+	if (!p)
+		return -EINVAL;
+
+	opaque_cred->cred = p;
+	return 0;
+}
+
+/*
+ * struct pnfs_osd_object_cred {
+ *	struct pnfs_osd_objid		oc_object_id;
+ *	u32				oc_osd_version;
+ *	u32				oc_cap_key_sec;
+ *	struct pnfs_osd_opaque_cred	oc_cap_key
+ *	struct pnfs_osd_opaque_cred	oc_cap;
+ * }; // xdr size 32 + 4 + 4 + [variable] + [variable]
+ */
+static int
+_osd_xdr_decode_object_cred(struct pnfs_osd_object_cred *comp,
+			    struct xdr_stream *xdr)
+{
+	__be32 *p = xdr_inline_decode(xdr, 32 + 4 + 4);
+	int ret;
+
+	if (!p)
+		return -EIO;
+
+	p = _osd_xdr_decode_objid(p, &comp->oc_object_id);
+	comp->oc_osd_version = be32_to_cpup(p++);
+	comp->oc_cap_key_sec = be32_to_cpup(p);
+
+	ret = _osd_xdr_decode_opaque_cred(&comp->oc_cap_key, xdr);
+	if (unlikely(ret))
+		return ret;
+
+	ret = _osd_xdr_decode_opaque_cred(&comp->oc_cap, xdr);
+	return ret;
+}
+
+/*
+ * struct pnfs_osd_data_map {
+ *	u32	odm_num_comps;
+ *	u64	odm_stripe_unit;
+ *	u32	odm_group_width;
+ *	u32	odm_group_depth;
+ *	u32	odm_mirror_cnt;
+ *	u32	odm_raid_algorithm;
+ * }; // xdr size 4 + 8 + 4 + 4 + 4 + 4
+ */
+static inline int
+_osd_data_map_xdr_sz(void)
+{
+	return 4 + 8 + 4 + 4 + 4 + 4;
+}
+
+static __be32 *
+_osd_xdr_decode_data_map(__be32 *p, struct pnfs_osd_data_map *data_map)
+{
+	data_map->odm_num_comps = be32_to_cpup(p++);
+	p = xdr_decode_hyper(p, &data_map->odm_stripe_unit);
+	data_map->odm_group_width = be32_to_cpup(p++);
+	data_map->odm_group_depth = be32_to_cpup(p++);
+	data_map->odm_mirror_cnt = be32_to_cpup(p++);
+	data_map->odm_raid_algorithm = be32_to_cpup(p++);
+	dprintk("%s: odm_num_comps=%u odm_stripe_unit=%llu odm_group_width=%u "
+		"odm_group_depth=%u odm_mirror_cnt=%u odm_raid_algorithm=%u\n",
+		__func__,
+		data_map->odm_num_comps,
+		(unsigned long long)data_map->odm_stripe_unit,
+		data_map->odm_group_width,
+		data_map->odm_group_depth,
+		data_map->odm_mirror_cnt,
+		data_map->odm_raid_algorithm);
+	return p;
+}
+
+int pnfs_osd_xdr_decode_layout_map(struct pnfs_osd_layout *layout,
+	struct pnfs_osd_xdr_decode_layout_iter *iter, struct xdr_stream *xdr)
+{
+	__be32 *p;
+
+	memset(iter, 0, sizeof(*iter));
+
+	p = xdr_inline_decode(xdr, _osd_data_map_xdr_sz() + 4 + 4);
+	if (unlikely(!p))
+		return -EINVAL;
+
+	p = _osd_xdr_decode_data_map(p, &layout->olo_map);
+	layout->olo_comps_index = be32_to_cpup(p++);
+	layout->olo_num_comps = be32_to_cpup(p++);
+	iter->total_comps = layout->olo_num_comps;
+	return 0;
+}
+
+bool pnfs_osd_xdr_decode_layout_comp(struct pnfs_osd_object_cred *comp,
+	struct pnfs_osd_xdr_decode_layout_iter *iter, struct xdr_stream *xdr,
+	int *err)
+{
+	BUG_ON(iter->decoded_comps > iter->total_comps);
+	if (iter->decoded_comps == iter->total_comps)
+		return false;
+
+	*err = _osd_xdr_decode_object_cred(comp, xdr);
+	if (unlikely(*err)) {
+		dprintk("%s: _osd_xdr_decode_object_cred=>%d decoded_comps=%d "
+			"total_comps=%d\n", __func__, *err,
+			iter->decoded_comps, iter->total_comps);
+		return false; /* stop the loop */
+	}
+	dprintk("%s: dev(%llx:%llx) par=0x%llx obj=0x%llx "
+		"key_len=%u cap_len=%u\n",
+		__func__,
+		_DEVID_LO(&comp->oc_object_id.oid_device_id),
+		_DEVID_HI(&comp->oc_object_id.oid_device_id),
+		comp->oc_object_id.oid_partition_id,
+		comp->oc_object_id.oid_object_id,
+		comp->oc_cap_key.cred_len, comp->oc_cap.cred_len);
+
+	iter->decoded_comps++;
+	return true;
+}
+
+/*
+ * Get Device Information Decoding
+ *
+ * Note: since Device Information is currently done synchronously, all
+ *       variable strings fields are left inside the rpc buffer and are only
+ *       pointed to by the pnfs_osd_deviceaddr members. So the read buffer
+ *       should not be freed while the returned information is in use.
+ */
+/*
+ *struct nfs4_string {
+ *	unsigned int len;
+ *	char *data;
+ *}; // size [variable]
+ * NOTE: Returned string points to inside the XDR buffer
+ */
+static __be32 *
+__read_u8_opaque(__be32 *p, struct nfs4_string *str)
+{
+	str->len = be32_to_cpup(p++);
+	str->data = (char *)p;
+
+	p += XDR_QUADLEN(str->len);
+	return p;
+}
+
+/*
+ * struct pnfs_osd_targetid {
+ *	u32			oti_type;
+ *	struct nfs4_string	oti_scsi_device_id;
+ * };// size 4 + [variable]
+ */
+static __be32 *
+__read_targetid(__be32 *p, struct pnfs_osd_targetid* targetid)
+{
+	u32 oti_type;
+
+	oti_type = be32_to_cpup(p++);
+	targetid->oti_type = oti_type;
+
+	switch (oti_type) {
+	case OBJ_TARGET_SCSI_NAME:
+	case OBJ_TARGET_SCSI_DEVICE_ID:
+		p = __read_u8_opaque(p, &targetid->oti_scsi_device_id);
+	}
+
+	return p;
+}
+
+/*
+ * struct pnfs_osd_net_addr {
+ *	struct nfs4_string	r_netid;
+ *	struct nfs4_string	r_addr;
+ * };
+ */
+static __be32 *
+__read_net_addr(__be32 *p, struct pnfs_osd_net_addr* netaddr)
+{
+	p = __read_u8_opaque(p, &netaddr->r_netid);
+	p = __read_u8_opaque(p, &netaddr->r_addr);
+
+	return p;
+}
+
+/*
+ * struct pnfs_osd_targetaddr {
+ *	u32				ota_available;
+ *	struct pnfs_osd_net_addr	ota_netaddr;
+ * };
+ */
+static __be32 *
+__read_targetaddr(__be32 *p, struct pnfs_osd_targetaddr *targetaddr)
+{
+	u32 ota_available;
+
+	ota_available = be32_to_cpup(p++);
+	targetaddr->ota_available = ota_available;
+
+	if (ota_available)
+		p = __read_net_addr(p, &targetaddr->ota_netaddr);
+
+
+	return p;
+}
+
+/*
+ * struct pnfs_osd_deviceaddr {
+ *	struct pnfs_osd_targetid	oda_targetid;
+ *	struct pnfs_osd_targetaddr	oda_targetaddr;
+ *	u8				oda_lun[8];
+ *	struct nfs4_string		oda_systemid;
+ *	struct pnfs_osd_object_cred	oda_root_obj_cred;
+ *	struct nfs4_string		oda_osdname;
+ * };
+ */
+
+/* We need this version for the pnfs_osd_xdr_decode_deviceaddr which does
+ * not have an xdr_stream
+ */
+static __be32 *
+__read_opaque_cred(__be32 *p,
+			      struct pnfs_osd_opaque_cred *opaque_cred)
+{
+	opaque_cred->cred_len = be32_to_cpu(*p++);
+	opaque_cred->cred = p;
+	return p + XDR_QUADLEN(opaque_cred->cred_len);
+}
+
+static __be32 *
+__read_object_cred(__be32 *p, struct pnfs_osd_object_cred *comp)
+{
+	p = _osd_xdr_decode_objid(p, &comp->oc_object_id);
+	comp->oc_osd_version = be32_to_cpup(p++);
+	comp->oc_cap_key_sec = be32_to_cpup(p++);
+
+	p = __read_opaque_cred(p, &comp->oc_cap_key);
+	p = __read_opaque_cred(p, &comp->oc_cap);
+	return p;
+}
+
+void pnfs_osd_xdr_decode_deviceaddr(
+	struct pnfs_osd_deviceaddr *deviceaddr, __be32 *p)
+{
+	p = __read_targetid(p, &deviceaddr->oda_targetid);
+
+	p = __read_targetaddr(p, &deviceaddr->oda_targetaddr);
+
+	p = xdr_decode_opaque_fixed(p, deviceaddr->oda_lun,
+				    sizeof(deviceaddr->oda_lun));
+
+	p = __read_u8_opaque(p, &deviceaddr->oda_systemid);
+
+	p = __read_object_cred(p, &deviceaddr->oda_root_obj_cred);
+
+	p = __read_u8_opaque(p, &deviceaddr->oda_osdname);
+
+	/* libosd likes this terminated in dbg. It's last, so no problems */
+	deviceaddr->oda_osdname.data[deviceaddr->oda_osdname.len] = 0;
+}
+
+/*
+ * struct pnfs_osd_layoutupdate {
+ *	u32	dsu_valid;
+ *	s64	dsu_delta;
+ *	u32	olu_ioerr_flag;
+ * }; xdr size 4 + 8 + 4
+ */
+int
+pnfs_osd_xdr_encode_layoutupdate(struct xdr_stream *xdr,
+				 struct pnfs_osd_layoutupdate *lou)
+{
+	__be32 *p = xdr_reserve_space(xdr,  4 + 8 + 4);
+
+	if (!p)
+		return -E2BIG;
+
+	*p++ = cpu_to_be32(lou->dsu_valid);
+	if (lou->dsu_valid)
+		p = xdr_encode_hyper(p, lou->dsu_delta);
+	*p++ = cpu_to_be32(lou->olu_ioerr_flag);
+	return 0;
+}
+
+/*
+ * struct pnfs_osd_objid {
+ *	struct pnfs_deviceid	oid_device_id;
+ *	u64			oid_partition_id;
+ *	u64			oid_object_id;
+ * }; // xdr size 32 bytes
+ */
+static inline __be32 *
+pnfs_osd_xdr_encode_objid(__be32 *p, struct pnfs_osd_objid *object_id)
+{
+	p = xdr_encode_opaque_fixed(p, &object_id->oid_device_id.data,
+				    sizeof(object_id->oid_device_id.data));
+	p = xdr_encode_hyper(p, object_id->oid_partition_id);
+	p = xdr_encode_hyper(p, object_id->oid_object_id);
+
+	return p;
+}
+
+/*
+ * struct pnfs_osd_ioerr {
+ *	struct pnfs_osd_objid	oer_component;
+ *	u64			oer_comp_offset;
+ *	u64			oer_comp_length;
+ *	u32			oer_iswrite;
+ *	u32			oer_errno;
+ * }; // xdr size 32 + 24 bytes
+ */
+void pnfs_osd_xdr_encode_ioerr(__be32 *p, struct pnfs_osd_ioerr *ioerr)
+{
+	p = pnfs_osd_xdr_encode_objid(p, &ioerr->oer_component);
+	p = xdr_encode_hyper(p, ioerr->oer_comp_offset);
+	p = xdr_encode_hyper(p, ioerr->oer_comp_length);
+	*p++ = cpu_to_be32(ioerr->oer_iswrite);
+	*p   = cpu_to_be32(ioerr->oer_errno);
+}
+
+__be32 *pnfs_osd_xdr_ioerr_reserve_space(struct xdr_stream *xdr)
+{
+	__be32 *p;
+
+	p = xdr_reserve_space(xdr, 32 + 24);
+	if (unlikely(!p))
+		dprintk("%s: out of xdr space\n", __func__);
+
+	return p;
+}
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 19/38] pnfs-obj: decode layout, alloc/free lseg
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (17 preceding siblings ...)
  2011-05-22 23:56 ` [PATCH v5 18/38] pnfs-obj: pnfs_osd XDR client implementation Benny Halevy
@ 2011-05-22 23:57 ` Benny Halevy
  2011-05-22 23:57 ` [PATCH v5 20/38] pnfs: per mount layout driver private data Benny Halevy
                   ` (19 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:57 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

From: Boaz Harrosh <bharrosh@panasas.com>

objlayout_alloc_lseg prepares an xdr_stream and calls the
raid engins objio_alloc_lseg() to allocate a private
pnfs_layout_segment.

objio_osd.c::objio_alloc_lseg() uses passed xdr_stream to
decode and store the layout_segment information in an
objio_segment struct, using the pnfs_osd_xdr.h API for
the actual parsing the layout xdr.

objlayout_free_lseg calls objio_free_lseg() to free the
allocated space.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[gfp_flags]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/objlayout/Kbuild      |    2 +-
 fs/nfs/objlayout/objio_osd.c |  169 +++++++++++++++++++++++++++++++++++++++++-
 fs/nfs/objlayout/objlayout.c |  104 ++++++++++++++++++++++++++
 fs/nfs/objlayout/objlayout.h |   67 +++++++++++++++++
 4 files changed, 340 insertions(+), 2 deletions(-)
 create mode 100644 fs/nfs/objlayout/objlayout.c
 create mode 100644 fs/nfs/objlayout/objlayout.h

diff --git a/fs/nfs/objlayout/Kbuild b/fs/nfs/objlayout/Kbuild
index 7b2a5a2..ed30ea0 100644
--- a/fs/nfs/objlayout/Kbuild
+++ b/fs/nfs/objlayout/Kbuild
@@ -1,5 +1,5 @@
 #
 # Makefile for the pNFS Objects Layout Driver kernel module
 #
-objlayoutdriver-y := objio_osd.o pnfs_osd_xdr_cli.o
+objlayoutdriver-y := objio_osd.o pnfs_osd_xdr_cli.o objlayout.o
 obj-$(CONFIG_PNFS_OBJLAYOUT) += objlayoutdriver.o
diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 379595f..5a43fbe 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -38,11 +38,178 @@
  */
 
 #include <linux/module.h>
-#include "../pnfs.h"
+#include <scsi/osd_initiator.h>
+
+#include "objlayout.h"
+
+#define NFSDBG_FACILITY         NFSDBG_PNFS_LD
+
+#define _LLU(x) ((unsigned long long)x)
+
+struct caps_buffers {
+	u8 caps_key[OSD_CRYPTO_KEYID_SIZE];
+	u8 creds[OSD_CAP_LEN];
+};
+
+struct objio_segment {
+	struct pnfs_layout_segment lseg;
+
+	struct pnfs_osd_object_cred *comps;
+
+	unsigned mirrors_p1;
+	unsigned stripe_unit;
+	unsigned group_width;	/* Data stripe_units without integrity comps */
+	u64 group_depth;
+	unsigned group_count;
+
+	unsigned comps_index;
+	unsigned num_comps;
+	/* variable length */
+	struct osd_dev	*ods[1];
+};
+
+static inline struct objio_segment *
+OBJIO_LSEG(struct pnfs_layout_segment *lseg)
+{
+	return container_of(lseg, struct objio_segment, lseg);
+}
+
+static int _verify_data_map(struct pnfs_osd_layout *layout)
+{
+	struct pnfs_osd_data_map *data_map = &layout->olo_map;
+	u64 stripe_length;
+	u32 group_width;
+
+/* FIXME: Only raid0 for now. if not go through MDS */
+	if (data_map->odm_raid_algorithm != PNFS_OSD_RAID_0) {
+		printk(KERN_ERR "Only RAID_0 for now\n");
+		return -ENOTSUPP;
+	}
+	if (0 != (data_map->odm_num_comps % (data_map->odm_mirror_cnt + 1))) {
+		printk(KERN_ERR "Data Map wrong, num_comps=%u mirrors=%u\n",
+			  data_map->odm_num_comps, data_map->odm_mirror_cnt);
+		return -EINVAL;
+	}
+
+	if (data_map->odm_group_width)
+		group_width = data_map->odm_group_width;
+	else
+		group_width = data_map->odm_num_comps /
+						(data_map->odm_mirror_cnt + 1);
+
+	stripe_length = (u64)data_map->odm_stripe_unit * group_width;
+	if (stripe_length >= (1ULL << 32)) {
+		printk(KERN_ERR "Total Stripe length(0x%llx)"
+			  " >= 32bit is not supported\n", _LLU(stripe_length));
+		return -ENOTSUPP;
+	}
+
+	if (0 != (data_map->odm_stripe_unit & ~PAGE_MASK)) {
+		printk(KERN_ERR "Stripe Unit(0x%llx)"
+			  " must be Multples of PAGE_SIZE(0x%lx)\n",
+			  _LLU(data_map->odm_stripe_unit), PAGE_SIZE);
+		return -ENOTSUPP;
+	}
+
+	return 0;
+}
+
+static void copy_single_comp(struct pnfs_osd_object_cred *cur_comp,
+			     struct pnfs_osd_object_cred *src_comp,
+			     struct caps_buffers *caps_p)
+{
+	WARN_ON(src_comp->oc_cap_key.cred_len > sizeof(caps_p->caps_key));
+	WARN_ON(src_comp->oc_cap.cred_len > sizeof(caps_p->creds));
+
+	*cur_comp = *src_comp;
+
+	memcpy(caps_p->caps_key, src_comp->oc_cap_key.cred,
+	       sizeof(caps_p->caps_key));
+	cur_comp->oc_cap_key.cred = caps_p->caps_key;
+
+	memcpy(caps_p->creds, src_comp->oc_cap.cred,
+	       sizeof(caps_p->creds));
+	cur_comp->oc_cap.cred = caps_p->creds;
+}
+
+extern int objio_alloc_lseg(struct pnfs_layout_segment **outp,
+	struct pnfs_layout_hdr *pnfslay,
+	struct pnfs_layout_range *range,
+	struct xdr_stream *xdr,
+	gfp_t gfp_flags)
+{
+	struct objio_segment *objio_seg;
+	struct pnfs_osd_xdr_decode_layout_iter iter;
+	struct pnfs_osd_layout layout;
+	struct pnfs_osd_object_cred *cur_comp, src_comp;
+	struct caps_buffers *caps_p;
+
+	int err;
+
+	err = pnfs_osd_xdr_decode_layout_map(&layout, &iter, xdr);
+	if (unlikely(err))
+		return err;
+
+	err = _verify_data_map(&layout);
+	if (unlikely(err))
+		return err;
+
+	objio_seg = kzalloc(sizeof(*objio_seg) +
+			    sizeof(*objio_seg->comps) * layout.olo_num_comps +
+			    sizeof(struct caps_buffers) * layout.olo_num_comps,
+			    gfp_flags);
+	if (!objio_seg)
+		return -ENOMEM;
+
+	cur_comp = objio_seg->comps = (void *)(objio_seg + 1);
+	caps_p = (void *)(cur_comp + layout.olo_num_comps);
+	while (pnfs_osd_xdr_decode_layout_comp(&src_comp, &iter, xdr, &err))
+		copy_single_comp(cur_comp++, &src_comp, caps_p++);
+	if (unlikely(err))
+		goto err;
+
+	objio_seg->num_comps = layout.olo_num_comps;
+	objio_seg->comps_index = layout.olo_comps_index;
+
+	objio_seg->mirrors_p1 = layout.olo_map.odm_mirror_cnt + 1;
+	objio_seg->stripe_unit = layout.olo_map.odm_stripe_unit;
+	if (layout.olo_map.odm_group_width) {
+		objio_seg->group_width = layout.olo_map.odm_group_width;
+		objio_seg->group_depth = layout.olo_map.odm_group_depth;
+		objio_seg->group_count = layout.olo_map.odm_num_comps /
+						objio_seg->mirrors_p1 /
+						objio_seg->group_width;
+	} else {
+		objio_seg->group_width = layout.olo_map.odm_num_comps /
+						objio_seg->mirrors_p1;
+		objio_seg->group_depth = -1;
+		objio_seg->group_count = 1;
+	}
+
+	*outp = &objio_seg->lseg;
+	return 0;
+
+err:
+	kfree(objio_seg);
+	dprintk("%s: Error: return %d\n", __func__, err);
+	*outp = NULL;
+	return err;
+}
+
+void objio_free_lseg(struct pnfs_layout_segment *lseg)
+{
+	struct objio_segment *objio_seg = OBJIO_LSEG(lseg);
+
+	kfree(objio_seg);
+}
+
 
 static struct pnfs_layoutdriver_type objlayout_type = {
 	.id = LAYOUT_OSD2_OBJECTS,
 	.name = "LAYOUT_OSD2_OBJECTS",
+
+	.alloc_lseg              = objlayout_alloc_lseg,
+	.free_lseg               = objlayout_free_lseg,
 };
 
 MODULE_DESCRIPTION("pNFS Layout Driver for OSD2 objects");
diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c
new file mode 100644
index 0000000..9465267
--- /dev/null
+++ b/fs/nfs/objlayout/objlayout.c
@@ -0,0 +1,104 @@
+/*
+ *  pNFS Objects layout driver high level definitions
+ *
+ *  Copyright (C) 2007 Panasas Inc. [year of first publication]
+ *  All rights reserved.
+ *
+ *  Benny Halevy <bhalevy@panasas.com>
+ *  Boaz Harrosh <bharrosh@panasas.com>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2
+ *  See the file COPYING included with this distribution for more details.
+ *
+ *  Redistribution and use in source and binary forms, with or without
+ *  modification, are permitted provided that the following conditions
+ *  are met:
+ *
+ *  1. Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *  2. Redistributions in binary form must reproduce the above copyright
+ *     notice, this list of conditions and the following disclaimer in the
+ *     documentation and/or other materials provided with the distribution.
+ *  3. Neither the name of the Panasas company nor the names of its
+ *     contributors may be used to endorse or promote products derived
+ *     from this software without specific prior written permission.
+ *
+ *  THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ *  WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ *  MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ *  DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ *  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ *  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ *  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ *  BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ *  LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ *  NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ *  SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <scsi/osd_initiator.h>
+#include "objlayout.h"
+
+#define NFSDBG_FACILITY         NFSDBG_PNFS_LD
+/*
+ * Unmarshall layout and store it in pnfslay.
+ */
+struct pnfs_layout_segment *
+objlayout_alloc_lseg(struct pnfs_layout_hdr *pnfslay,
+		     struct nfs4_layoutget_res *lgr,
+		     gfp_t gfp_flags)
+{
+	int status = -ENOMEM;
+	struct xdr_stream stream;
+	struct xdr_buf buf = {
+		.pages =  lgr->layoutp->pages,
+		.page_len =  lgr->layoutp->len,
+		.buflen =  lgr->layoutp->len,
+		.len = lgr->layoutp->len,
+	};
+	struct page *scratch;
+	struct pnfs_layout_segment *lseg;
+
+	dprintk("%s: Begin pnfslay %p\n", __func__, pnfslay);
+
+	scratch = alloc_page(gfp_flags);
+	if (!scratch)
+		goto err_nofree;
+
+	xdr_init_decode(&stream, &buf, NULL);
+	xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE);
+
+	status = objio_alloc_lseg(&lseg, pnfslay, &lgr->range, &stream, gfp_flags);
+	if (unlikely(status)) {
+		dprintk("%s: objio_alloc_lseg Return err %d\n", __func__,
+			status);
+		goto err;
+	}
+
+	__free_page(scratch);
+
+	dprintk("%s: Return %p\n", __func__, lseg);
+	return lseg;
+
+err:
+	__free_page(scratch);
+err_nofree:
+	dprintk("%s: Err Return=>%d\n", __func__, status);
+	return ERR_PTR(status);
+}
+
+/*
+ * Free a layout segement
+ */
+void
+objlayout_free_lseg(struct pnfs_layout_segment *lseg)
+{
+	dprintk("%s: freeing layout segment %p\n", __func__, lseg);
+
+	if (unlikely(!lseg))
+		return;
+
+	objio_free_lseg(lseg);
+}
+
diff --git a/fs/nfs/objlayout/objlayout.h b/fs/nfs/objlayout/objlayout.h
new file mode 100644
index 0000000..066280a
--- /dev/null
+++ b/fs/nfs/objlayout/objlayout.h
@@ -0,0 +1,67 @@
+/*
+ *  Data types and function declerations for interfacing with the
+ *  pNFS standard object layout driver.
+ *
+ *  Copyright (C) 2007 Panasas Inc. [year of first publication]
+ *  All rights reserved.
+ *
+ *  Benny Halevy <bhalevy@panasas.com>
+ *  Boaz Harrosh <bharrosh@panasas.com>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2
+ *  See the file COPYING included with this distribution for more details.
+ *
+ *  Redistribution and use in source and binary forms, with or without
+ *  modification, are permitted provided that the following conditions
+ *  are met:
+ *
+ *  1. Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *  2. Redistributions in binary form must reproduce the above copyright
+ *     notice, this list of conditions and the following disclaimer in the
+ *     documentation and/or other materials provided with the distribution.
+ *  3. Neither the name of the Panasas company nor the names of its
+ *     contributors may be used to endorse or promote products derived
+ *     from this software without specific prior written permission.
+ *
+ *  THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ *  WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ *  MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ *  DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ *  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ *  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ *  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ *  BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ *  LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ *  NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ *  SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _OBJLAYOUT_H
+#define _OBJLAYOUT_H
+
+#include <linux/nfs_fs.h>
+#include <linux/pnfs_osd_xdr.h>
+#include "../pnfs.h"
+
+/*
+ * Raid engine I/O API
+ */
+extern int objio_alloc_lseg(struct pnfs_layout_segment **outp,
+	struct pnfs_layout_hdr *pnfslay,
+	struct pnfs_layout_range *range,
+	struct xdr_stream *xdr,
+	gfp_t gfp_flags);
+extern void objio_free_lseg(struct pnfs_layout_segment *lseg);
+
+/*
+ * exported generic objects function vectors
+ */
+extern struct pnfs_layout_segment *objlayout_alloc_lseg(
+	struct pnfs_layout_hdr *,
+	struct nfs4_layoutget_res *,
+	gfp_t gfp_flags);
+extern void objlayout_free_lseg(struct pnfs_layout_segment *);
+
+#endif /* _OBJLAYOUT_H */
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 20/38] pnfs: per mount layout driver private data
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (18 preceding siblings ...)
  2011-05-22 23:57 ` [PATCH v5 19/38] pnfs-obj: decode layout, alloc/free lseg Benny Halevy
@ 2011-05-22 23:57 ` Benny Halevy
  2011-05-23  4:38   ` Boaz Harrosh
  2011-05-22 23:57 ` [PATCH v5 21/38] pnfs-obj: objio_osd device information retrieval and caching Benny Halevy
                   ` (18 subsequent siblings)
  38 siblings, 1 reply; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:57 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

With the objects layout security model we have object capabilities
that are associated with the layout ad we anticipate that the server
will issue a cb_layoutrecall for any setattr that changes security
related attributes (user/group/mode/acl) or truncates the file.
Therefore, the client returns the layout in advance to avoid the
extra layout recall.

[get rid of ds_[rw]size]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 include/linux/nfs_fs_sb.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 87694ca..66e031f 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -143,6 +143,7 @@ struct nfs_server {
 						   filesystem */
 	struct pnfs_layoutdriver_type  *pnfs_curr_ld; /* Active layout driver */
 	struct rpc_wait_queue	roc_rpcwaitq;
+	void			       *pnfs_ld_data; /* Per-mount data */
 
 	/* the following fields are protected by nfs_client->cl_lock */
 	struct rb_root		state_owners;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 21/38] pnfs-obj: objio_osd device information retrieval and caching
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (19 preceding siblings ...)
  2011-05-22 23:57 ` [PATCH v5 20/38] pnfs: per mount layout driver private data Benny Halevy
@ 2011-05-22 23:57 ` Benny Halevy
  2011-05-22 23:58 ` [PATCH v5 22/38] pnfs: set/unset layoutdriver Benny Halevy
                   ` (17 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:57 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

From: Boaz Harrosh <bharrosh@panasas.com>

When a new layout is received in objio_alloc_lseg all device_ids
referenced are retrieved. The device information is queried for from MDS
and then the osd_device is looked-up from the osd-initiator library. The
devices are cached in a per-mount-point list, for later use. At unmount
all devices are "put" back to the library.

objlayout_get_deviceinfo(), objlayout_put_deviceinfo() middleware
API for retrieving device information given a device_id.

TODO: The device cache can get big. Cap its size. Keep an LRU and start
      to return devices which were not used, when list gets to big, or
      when new entries allocation fail.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[gfp_flags]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/objlayout/objio_osd.c |  150 ++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/objlayout/objlayout.c |   68 +++++++++++++++++++
 fs/nfs/objlayout/objlayout.h |    8 ++
 3 files changed, 226 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 5a43fbe..752bf7a 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -46,6 +46,69 @@
 
 #define _LLU(x) ((unsigned long long)x)
 
+/* A per mountpoint struct currently for device cache */
+struct objio_mount_type {
+	struct list_head dev_list;
+	spinlock_t dev_list_lock;
+};
+
+struct _dev_ent {
+	struct list_head list;
+	struct nfs4_deviceid d_id;
+	struct osd_dev *od;
+};
+
+static struct osd_dev *___dev_list_find(struct objio_mount_type *omt,
+	struct nfs4_deviceid *d_id)
+{
+	struct list_head *le;
+
+	list_for_each(le, &omt->dev_list) {
+		struct _dev_ent *de = list_entry(le, struct _dev_ent, list);
+
+		if (0 == memcmp(&de->d_id, d_id, sizeof(*d_id)))
+			return de->od;
+	}
+
+	return NULL;
+}
+
+static struct osd_dev *_dev_list_find(struct objio_mount_type *omt,
+	struct nfs4_deviceid *d_id)
+{
+	struct osd_dev *od;
+
+	spin_lock(&omt->dev_list_lock);
+	od = ___dev_list_find(omt, d_id);
+	spin_unlock(&omt->dev_list_lock);
+	return od;
+}
+
+static int _dev_list_add(struct objio_mount_type *omt,
+	struct nfs4_deviceid *d_id, struct osd_dev *od,
+	gfp_t gfp_flags)
+{
+	struct _dev_ent *de = kzalloc(sizeof(*de), gfp_flags);
+
+	if (!de)
+		return -ENOMEM;
+
+	spin_lock(&omt->dev_list_lock);
+
+	if (___dev_list_find(omt, d_id)) {
+		kfree(de);
+		goto out;
+	}
+
+	de->d_id = *d_id;
+	de->od = od;
+	list_add(&de->list, &omt->dev_list);
+
+out:
+	spin_unlock(&omt->dev_list_lock);
+	return 0;
+}
+
 struct caps_buffers {
 	u8 caps_key[OSD_CRYPTO_KEYID_SIZE];
 	u8 creds[OSD_CAP_LEN];
@@ -74,6 +137,90 @@ OBJIO_LSEG(struct pnfs_layout_segment *lseg)
 	return container_of(lseg, struct objio_segment, lseg);
 }
 
+/* Send and wait for a get_device_info of devices in the layout,
+   then look them up with the osd_initiator library */
+static struct osd_dev *_device_lookup(struct pnfs_layout_hdr *pnfslay,
+				struct objio_segment *objio_seg, unsigned comp,
+				gfp_t gfp_flags)
+{
+	struct pnfs_osd_deviceaddr *deviceaddr;
+	struct nfs4_deviceid *d_id;
+	struct osd_dev *od;
+	struct osd_dev_info odi;
+	struct objio_mount_type *omt =
+				   NFS_SERVER(pnfslay->plh_inode)->pnfs_ld_data;
+	int err;
+
+	d_id = &objio_seg->comps[comp].oc_object_id.oid_device_id;
+
+	od = _dev_list_find(omt, d_id);
+	if (od)
+		return od;
+
+	err = objlayout_get_deviceinfo(pnfslay, d_id, &deviceaddr, gfp_flags);
+	if (unlikely(err)) {
+		dprintk("%s: objlayout_get_deviceinfo dev(%llx:%llx) =>%d\n",
+			__func__, _DEVID_LO(d_id), _DEVID_HI(d_id), err);
+		return ERR_PTR(err);
+	}
+
+	odi.systemid_len = deviceaddr->oda_systemid.len;
+	if (odi.systemid_len > sizeof(odi.systemid)) {
+		err = -EINVAL;
+		goto out;
+	} else if (odi.systemid_len)
+		memcpy(odi.systemid, deviceaddr->oda_systemid.data,
+		       odi.systemid_len);
+	odi.osdname_len	 = deviceaddr->oda_osdname.len;
+	odi.osdname	 = (u8 *)deviceaddr->oda_osdname.data;
+
+	if (!odi.osdname_len && !odi.systemid_len) {
+		dprintk("%s: !odi.osdname_len && !odi.systemid_len\n",
+			__func__);
+		err = -ENODEV;
+		goto out;
+	}
+
+	od = osduld_info_lookup(&odi);
+	if (unlikely(IS_ERR(od))) {
+		err = PTR_ERR(od);
+		dprintk("%s: osduld_info_lookup => %d\n", __func__, err);
+		goto out;
+	}
+
+	_dev_list_add(omt, d_id, od, gfp_flags);
+
+out:
+	dprintk("%s: return=%d\n", __func__, err);
+	objlayout_put_deviceinfo(deviceaddr);
+	return err ? ERR_PTR(err) : od;
+}
+
+static int objio_devices_lookup(struct pnfs_layout_hdr *pnfslay,
+	struct objio_segment *objio_seg,
+	gfp_t gfp_flags)
+{
+	unsigned i;
+	int err;
+
+	/* lookup all devices */
+	for (i = 0; i < objio_seg->num_comps; i++) {
+		struct osd_dev *od;
+
+		od = _device_lookup(pnfslay, objio_seg, i, gfp_flags);
+		if (unlikely(IS_ERR(od))) {
+			err = PTR_ERR(od);
+			goto out;
+		}
+		objio_seg->ods[i] = od;
+	}
+	err = 0;
+
+out:
+	dprintk("%s: return=%d\n", __func__, err);
+	return err;
+}
+
 static int _verify_data_map(struct pnfs_osd_layout *layout)
 {
 	struct pnfs_osd_data_map *data_map = &layout->olo_map;
@@ -170,6 +317,9 @@ extern int objio_alloc_lseg(struct pnfs_layout_segment **outp,
 
 	objio_seg->num_comps = layout.olo_num_comps;
 	objio_seg->comps_index = layout.olo_comps_index;
+	err = objio_devices_lookup(pnfslay, objio_seg, gfp_flags);
+	if (err)
+		goto err;
 
 	objio_seg->mirrors_p1 = layout.olo_map.odm_mirror_cnt + 1;
 	objio_seg->stripe_unit = layout.olo_map.odm_stripe_unit;
diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c
index 9465267..10e5fca 100644
--- a/fs/nfs/objlayout/objlayout.c
+++ b/fs/nfs/objlayout/objlayout.c
@@ -102,3 +102,71 @@ objlayout_free_lseg(struct pnfs_layout_segment *lseg)
 	objio_free_lseg(lseg);
 }
 
+/*
+ * Get Device Info API for io engines
+ */
+struct objlayout_deviceinfo {
+	struct page *page;
+	struct pnfs_osd_deviceaddr da; /* This must be last */
+};
+
+/* Initialize and call nfs_getdeviceinfo, then decode and return a
+ * "struct pnfs_osd_deviceaddr *" Eventually objlayout_put_deviceinfo()
+ * should be called.
+ */
+int objlayout_get_deviceinfo(struct pnfs_layout_hdr *pnfslay,
+	struct nfs4_deviceid *d_id, struct pnfs_osd_deviceaddr **deviceaddr,
+	gfp_t gfp_flags)
+{
+	struct objlayout_deviceinfo *odi;
+	struct pnfs_device pd;
+	struct super_block *sb;
+	struct page *page, **pages;
+	u32 *p;
+	int err;
+
+	page = alloc_page(gfp_flags);
+	if (!page)
+		return -ENOMEM;
+
+	pages = &page;
+	pd.pages = pages;
+
+	memcpy(&pd.dev_id, d_id, sizeof(*d_id));
+	pd.layout_type = LAYOUT_OSD2_OBJECTS;
+	pd.pages = &page;
+	pd.pgbase = 0;
+	pd.pglen = PAGE_SIZE;
+	pd.mincount = 0;
+
+	sb = pnfslay->plh_inode->i_sb;
+	err = nfs4_proc_getdeviceinfo(NFS_SERVER(pnfslay->plh_inode), &pd);
+	dprintk("%s nfs_getdeviceinfo returned %d\n", __func__, err);
+	if (err)
+		goto err_out;
+
+	p = page_address(page);
+	odi = kzalloc(sizeof(*odi), gfp_flags);
+	if (!odi) {
+		err = -ENOMEM;
+		goto err_out;
+	}
+	pnfs_osd_xdr_decode_deviceaddr(&odi->da, p);
+	odi->page = page;
+	*deviceaddr = &odi->da;
+	return 0;
+
+err_out:
+	__free_page(page);
+	return err;
+}
+
+void objlayout_put_deviceinfo(struct pnfs_osd_deviceaddr *deviceaddr)
+{
+	struct objlayout_deviceinfo *odi = container_of(deviceaddr,
+						struct objlayout_deviceinfo,
+						da);
+
+	__free_page(odi->page);
+	kfree(odi);
+}
diff --git a/fs/nfs/objlayout/objlayout.h b/fs/nfs/objlayout/objlayout.h
index 066280a..0814271 100644
--- a/fs/nfs/objlayout/objlayout.h
+++ b/fs/nfs/objlayout/objlayout.h
@@ -56,6 +56,14 @@ extern int objio_alloc_lseg(struct pnfs_layout_segment **outp,
 extern void objio_free_lseg(struct pnfs_layout_segment *lseg);
 
 /*
+ * callback API
+ */
+extern int objlayout_get_deviceinfo(struct pnfs_layout_hdr *pnfslay,
+	struct nfs4_deviceid *d_id, struct pnfs_osd_deviceaddr **deviceaddr,
+	gfp_t gfp_flags);
+extern void objlayout_put_deviceinfo(struct pnfs_osd_deviceaddr *deviceaddr);
+
+/*
  * exported generic objects function vectors
  */
 extern struct pnfs_layout_segment *objlayout_alloc_lseg(
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 22/38] pnfs: set/unset layoutdriver
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (20 preceding siblings ...)
  2011-05-22 23:57 ` [PATCH v5 21/38] pnfs-obj: objio_osd device information retrieval and caching Benny Halevy
@ 2011-05-22 23:58 ` Benny Halevy
  2011-05-22 23:58 ` [PATCH v5 23/38] SQUASHME: pnfs-obj: use global device cache Benny Halevy
                   ` (16 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:58 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

For managing per nfs_server layout driver data

[was: pass mntfh down the init_pnfs path]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/pnfs.c |   13 ++++++++++++-
 fs/nfs/pnfs.h |    4 ++++
 2 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 20436a5..96506e7 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -75,8 +75,11 @@ find_pnfs_driver(u32 id)
 void
 unset_pnfs_layoutdriver(struct nfs_server *nfss)
 {
-	if (nfss->pnfs_curr_ld)
+	if (nfss->pnfs_curr_ld) {
+		if (nfss->pnfs_curr_ld->unset_layoutdriver)
+			nfss->pnfs_curr_ld->unset_layoutdriver(nfss);
 		module_put(nfss->pnfs_curr_ld->owner);
+	}
 	nfss->pnfs_curr_ld = NULL;
 }
 
@@ -115,6 +118,14 @@ set_pnfs_layoutdriver(struct nfs_server *server, u32 id)
 	}
 	server->pnfs_curr_ld = ld_type;
 
+	if (ld_type->set_layoutdriver &&
+	    ld_type->set_layoutdriver(server)) {
+		dprintk("%s: Error initializing mount point for layout driver %u.\n",
+		       __func__, id);
+		module_put(ld_type->owner);
+		goto out_no_driver;
+	}
+
 	dprintk("%s: pNFS module for %u set\n", __func__, id);
 	return;
 
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 7417be9..f118134 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -73,6 +73,10 @@ struct pnfs_layoutdriver_type {
 	const u32 id;
 	const char *name;
 	struct module *owner;
+
+	int (*set_layoutdriver) (struct nfs_server *);
+	int (*unset_layoutdriver) (struct nfs_server *);
+
 	struct pnfs_layout_segment * (*alloc_lseg) (struct pnfs_layout_hdr *layoutid, struct nfs4_layoutget_res *lgr, gfp_t gfp_flags);
 	void (*free_lseg) (struct pnfs_layout_segment *lseg);
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 23/38] SQUASHME: pnfs-obj: use global device cache
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (21 preceding siblings ...)
  2011-05-22 23:58 ` [PATCH v5 22/38] pnfs: set/unset layoutdriver Benny Halevy
@ 2011-05-22 23:58 ` Benny Halevy
  2011-05-23  4:52   ` Boaz Harrosh
  2011-05-22 23:59 ` [PATCH v5 24/38] SQUASHME: Revert "pnfs: per mount layout driver private data" Benny Halevy
                   ` (15 subsequent siblings)
  38 siblings, 1 reply; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:58 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/objlayout/objio_osd.c |  102 ++++++++++++++++++++----------------------
 1 files changed, 49 insertions(+), 53 deletions(-)

diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 752bf7a..bcc8468 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -46,66 +46,55 @@
 
 #define _LLU(x) ((unsigned long long)x)
 
-/* A per mountpoint struct currently for device cache */
-struct objio_mount_type {
-	struct list_head dev_list;
-	spinlock_t dev_list_lock;
-};
-
-struct _dev_ent {
-	struct list_head list;
-	struct nfs4_deviceid d_id;
+struct objio_dev_ent {
+	struct nfs4_deviceid_node id_node;
 	struct osd_dev *od;
 };
 
-static struct osd_dev *___dev_list_find(struct objio_mount_type *omt,
-	struct nfs4_deviceid *d_id)
+static void
+objio_free_deviceid_node(struct nfs4_deviceid_node *d)
 {
-	struct list_head *le;
+	struct objio_dev_ent *de = container_of(d, struct objio_dev_ent, id_node);
 
-	list_for_each(le, &omt->dev_list) {
-		struct _dev_ent *de = list_entry(le, struct _dev_ent, list);
-
-		if (0 == memcmp(&de->d_id, d_id, sizeof(*d_id)))
-			return de->od;
-	}
-
-	return NULL;
+	osduld_put_device(de->od);
+	kfree(de);
 }
 
-static struct osd_dev *_dev_list_find(struct objio_mount_type *omt,
-	struct nfs4_deviceid *d_id)
+static struct objio_dev_ent *_dev_list_find(const struct nfs_client *clp,
+	const struct nfs4_deviceid *d_id)
 {
-	struct osd_dev *od;
+	struct nfs4_deviceid_node *d;
 
-	spin_lock(&omt->dev_list_lock);
-	od = ___dev_list_find(omt, d_id);
-	spin_unlock(&omt->dev_list_lock);
-	return od;
+	d = nfs4_find_get_deviceid(clp, d_id);
+	if (!d)
+		return NULL;
+	return container_of(d, struct objio_dev_ent, id_node);
 }
 
-static int _dev_list_add(struct objio_mount_type *omt,
-	struct nfs4_deviceid *d_id, struct osd_dev *od,
+static int _dev_list_add(const struct nfs_server *nfss,
+	const struct nfs4_deviceid *d_id, struct osd_dev *od,
 	gfp_t gfp_flags)
 {
-	struct _dev_ent *de = kzalloc(sizeof(*de), gfp_flags);
+	struct nfs4_deviceid_node *d;
+	struct objio_dev_ent *de = kzalloc(sizeof(*de), gfp_flags);
+	struct objio_dev_ent *n;
 
 	if (!de)
 		return -ENOMEM;
 
-	spin_lock(&omt->dev_list_lock);
+	nfs4_init_deviceid_node(&de->id_node,
+				nfss->pnfs_curr_ld,
+				nfss->nfs_client,
+				d_id);
+	de->od = od;
 
-	if (___dev_list_find(omt, d_id)) {
-		kfree(de);
-		goto out;
+	d = nfs4_insert_deviceid_node(&de->id_node);
+	n = container_of(d, struct objio_dev_ent, id_node);
+	if (n != de) {
+		BUG_ON(n->od != od);
+		objio_free_deviceid_node(&de->id_node);
 	}
 
-	de->d_id = *d_id;
-	de->od = od;
-	list_add(&de->list, &omt->dev_list);
-
-out:
-	spin_unlock(&omt->dev_list_lock);
 	return 0;
 }
 
@@ -128,7 +117,7 @@ struct objio_segment {
 	unsigned comps_index;
 	unsigned num_comps;
 	/* variable length */
-	struct osd_dev	*ods[1];
+	struct objio_dev_ent *ods[1];
 };
 
 static inline struct objio_segment *
@@ -139,23 +128,22 @@ OBJIO_LSEG(struct pnfs_layout_segment *lseg)
 
 /* Send and wait for a get_device_info of devices in the layout,
    then look them up with the osd_initiator library */
-static struct osd_dev *_device_lookup(struct pnfs_layout_hdr *pnfslay,
+static struct objio_dev_ent *_device_lookup(struct pnfs_layout_hdr *pnfslay,
 				struct objio_segment *objio_seg, unsigned comp,
 				gfp_t gfp_flags)
 {
 	struct pnfs_osd_deviceaddr *deviceaddr;
 	struct nfs4_deviceid *d_id;
+	struct objio_dev_ent *ode;
 	struct osd_dev *od;
 	struct osd_dev_info odi;
-	struct objio_mount_type *omt =
-				   NFS_SERVER(pnfslay->plh_inode)->pnfs_ld_data;
 	int err;
 
 	d_id = &objio_seg->comps[comp].oc_object_id.oid_device_id;
 
-	od = _dev_list_find(omt, d_id);
-	if (od)
-		return od;
+	ode = _dev_list_find(NFS_SERVER(pnfslay->plh_inode)->nfs_client, d_id);
+	if (ode)
+		return ode;
 
 	err = objlayout_get_deviceinfo(pnfslay, d_id, &deviceaddr, gfp_flags);
 	if (unlikely(err)) {
@@ -188,7 +176,7 @@ static struct osd_dev *_device_lookup(struct pnfs_layout_hdr *pnfslay,
 		goto out;
 	}
 
-	_dev_list_add(omt, d_id, od, gfp_flags);
+	_dev_list_add(NFS_SERVER(pnfslay->plh_inode), d_id, od, gfp_flags);
 
 out:
 	dprintk("%s: return=%d\n", __func__, err);
@@ -205,14 +193,14 @@ static int objio_devices_lookup(struct pnfs_layout_hdr *pnfslay,
 
 	/* lookup all devices */
 	for (i = 0; i < objio_seg->num_comps; i++) {
-		struct osd_dev *od;
+		struct objio_dev_ent *ode;
 
-		od = _device_lookup(pnfslay, objio_seg, i, gfp_flags);
-		if (unlikely(IS_ERR(od))) {
-			err = PTR_ERR(od);
+		ode = _device_lookup(pnfslay, objio_seg, i, gfp_flags);
+		if (unlikely(IS_ERR(ode))) {
+			err = PTR_ERR(ode);
 			goto out;
 		}
-		objio_seg->ods[i] = od;
+		objio_seg->ods[i] = ode;
 	}
 	err = 0;
 
@@ -348,8 +336,14 @@ err:
 
 void objio_free_lseg(struct pnfs_layout_segment *lseg)
 {
+	int i;
 	struct objio_segment *objio_seg = OBJIO_LSEG(lseg);
 
+	for (i = 0; i < objio_seg->num_comps; i++) {
+		if (!objio_seg->ods[i])
+			break;
+		nfs4_put_deviceid_node(&objio_seg->ods[i]->id_node);
+	}
 	kfree(objio_seg);
 }
 
@@ -360,6 +354,8 @@ static struct pnfs_layoutdriver_type objlayout_type = {
 
 	.alloc_lseg              = objlayout_alloc_lseg,
 	.free_lseg               = objlayout_free_lseg,
+
+	.free_deviceid_node	 = objio_free_deviceid_node,
 };
 
 MODULE_DESCRIPTION("pNFS Layout Driver for OSD2 objects");
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 24/38] SQUASHME: Revert "pnfs: per mount layout driver private data"
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (22 preceding siblings ...)
  2011-05-22 23:58 ` [PATCH v5 23/38] SQUASHME: pnfs-obj: use global device cache Benny Halevy
@ 2011-05-22 23:59 ` Benny Halevy
  2011-05-22 23:59 ` [PATCH v5 25/38] SQUASHME: Revert "pnfs: set/unset layoutdriver" Benny Halevy
                   ` (14 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:59 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

This reverts commit 52aebd8e7a43cf487403048c49e9eccd829a78ab.
---
 include/linux/nfs_fs_sb.h |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 66e031f..87694ca 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -143,7 +143,6 @@ struct nfs_server {
 						   filesystem */
 	struct pnfs_layoutdriver_type  *pnfs_curr_ld; /* Active layout driver */
 	struct rpc_wait_queue	roc_rpcwaitq;
-	void			       *pnfs_ld_data; /* Per-mount data */
 
 	/* the following fields are protected by nfs_client->cl_lock */
 	struct rb_root		state_owners;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 25/38] SQUASHME: Revert "pnfs: set/unset layoutdriver"
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (23 preceding siblings ...)
  2011-05-22 23:59 ` [PATCH v5 24/38] SQUASHME: Revert "pnfs: per mount layout driver private data" Benny Halevy
@ 2011-05-22 23:59 ` Benny Halevy
  2011-05-22 23:59 ` [PATCH v5 26/38] NFSv4.1: use layout driver in global device cache Benny Halevy
                   ` (13 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:59 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

This reverts commit 857e1d3081acad3a4db2e3506038b6a0e3f5a8cc.
---
 fs/nfs/pnfs.c |   13 +------------
 fs/nfs/pnfs.h |    4 ----
 2 files changed, 1 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 96506e7..20436a5 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -75,11 +75,8 @@ find_pnfs_driver(u32 id)
 void
 unset_pnfs_layoutdriver(struct nfs_server *nfss)
 {
-	if (nfss->pnfs_curr_ld) {
-		if (nfss->pnfs_curr_ld->unset_layoutdriver)
-			nfss->pnfs_curr_ld->unset_layoutdriver(nfss);
+	if (nfss->pnfs_curr_ld)
 		module_put(nfss->pnfs_curr_ld->owner);
-	}
 	nfss->pnfs_curr_ld = NULL;
 }
 
@@ -118,14 +115,6 @@ set_pnfs_layoutdriver(struct nfs_server *server, u32 id)
 	}
 	server->pnfs_curr_ld = ld_type;
 
-	if (ld_type->set_layoutdriver &&
-	    ld_type->set_layoutdriver(server)) {
-		dprintk("%s: Error initializing mount point for layout driver %u.\n",
-		       __func__, id);
-		module_put(ld_type->owner);
-		goto out_no_driver;
-	}
-
 	dprintk("%s: pNFS module for %u set\n", __func__, id);
 	return;
 
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index f118134..7417be9 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -73,10 +73,6 @@ struct pnfs_layoutdriver_type {
 	const u32 id;
 	const char *name;
 	struct module *owner;
-
-	int (*set_layoutdriver) (struct nfs_server *);
-	int (*unset_layoutdriver) (struct nfs_server *);
-
 	struct pnfs_layout_segment * (*alloc_lseg) (struct pnfs_layout_hdr *layoutid, struct nfs4_layoutget_res *lgr, gfp_t gfp_flags);
 	void (*free_lseg) (struct pnfs_layout_segment *lseg);
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 26/38] NFSv4.1: use layout driver in global device cache
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (24 preceding siblings ...)
  2011-05-22 23:59 ` [PATCH v5 25/38] SQUASHME: Revert "pnfs: set/unset layoutdriver" Benny Halevy
@ 2011-05-22 23:59 ` Benny Halevy
  2011-05-23  0:00 ` [PATCH v5 27/38] pnfs: alloc and free layout_hdr layoutdriver methods Benny Halevy
                   ` (12 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-22 23:59 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

pnfs deviceids are unique per server, per layout type.
struct nfs_client is currently used to distinguish deviceids from
different nfs servers, yet these may clash between different layout
types on the same server.  Therefore, use the layout driver associated
with each deviceid at insertion time to look it up, unhash, or
delete it.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/callback_proc.c       |    2 +-
 fs/nfs/nfs4filelayout.c      |    3 ++-
 fs/nfs/objlayout/objio_osd.c |    6 +++---
 fs/nfs/pnfs.h                |    6 +++---
 fs/nfs/pnfs_dev.c            |   28 +++++++++++++++++-----------
 5 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 5780c37..d4d1954 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -278,7 +278,7 @@ __be32 nfs4_callback_devicenotify(struct cb_devicenotifyargs *args,
 		if (dev->cbd_notify_type == NOTIFY_DEVICEID4_CHANGE)
 			dprintk("%s: NOTIFY_DEVICEID4_CHANGE not supported, "
 				"deleting instead\n", __func__);
-		nfs4_delete_deviceid(clp, &dev->cbd_dev_id);
+		nfs4_delete_deviceid(server->pnfs_curr_ld, clp, &dev->cbd_dev_id);
 	}
 
 out:
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 5b3080d..2529db0 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -441,7 +441,8 @@ filelayout_check_layout(struct pnfs_layout_hdr *lo,
 	}
 
 	/* find and reference the deviceid */
-	d = nfs4_find_get_deviceid(NFS_SERVER(lo->plh_inode)->nfs_client, id);
+	d = nfs4_find_get_deviceid(NFS_SERVER(lo->plh_inode)->pnfs_curr_ld,
+				   NFS_SERVER(lo->plh_inode)->nfs_client, id);
 	if (d == NULL) {
 		dsaddr = get_device_info(lo->plh_inode, id, gfp_flags);
 		if (dsaddr == NULL)
diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index bcc8468..a4201d8 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -60,12 +60,12 @@ objio_free_deviceid_node(struct nfs4_deviceid_node *d)
 	kfree(de);
 }
 
-static struct objio_dev_ent *_dev_list_find(const struct nfs_client *clp,
+static struct objio_dev_ent *_dev_list_find(const struct nfs_server *nfss,
 	const struct nfs4_deviceid *d_id)
 {
 	struct nfs4_deviceid_node *d;
 
-	d = nfs4_find_get_deviceid(clp, d_id);
+	d = nfs4_find_get_deviceid(nfss->pnfs_curr_ld, nfss->nfs_client, d_id);
 	if (!d)
 		return NULL;
 	return container_of(d, struct objio_dev_ent, id_node);
@@ -141,7 +141,7 @@ static struct objio_dev_ent *_device_lookup(struct pnfs_layout_hdr *pnfslay,
 
 	d_id = &objio_seg->comps[comp].oc_object_id.oid_device_id;
 
-	ode = _dev_list_find(NFS_SERVER(pnfslay->plh_inode)->nfs_client, d_id);
+	ode = _dev_list_find(NFS_SERVER(pnfslay->plh_inode), d_id);
 	if (ode)
 		return ode;
 
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 7417be9..2e94ed3 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -172,9 +172,9 @@ struct nfs4_deviceid_node {
 };
 
 void nfs4_print_deviceid(const struct nfs4_deviceid *dev_id);
-struct nfs4_deviceid_node *nfs4_find_get_deviceid(const struct nfs_client *, const struct nfs4_deviceid *);
-struct nfs4_deviceid_node *nfs4_unhash_put_deviceid(const struct nfs_client *, const struct nfs4_deviceid *);
-void nfs4_delete_deviceid(const struct nfs_client *, const struct nfs4_deviceid *);
+struct nfs4_deviceid_node *nfs4_find_get_deviceid(const struct pnfs_layoutdriver_type *, const struct nfs_client *, const struct nfs4_deviceid *);
+struct nfs4_deviceid_node *nfs4_unhash_put_deviceid(const struct pnfs_layoutdriver_type *, const struct nfs_client *, const struct nfs4_deviceid *);
+void nfs4_delete_deviceid(const struct pnfs_layoutdriver_type *, const struct nfs_client *, const struct nfs4_deviceid *);
 void nfs4_init_deviceid_node(struct nfs4_deviceid_node *,
 			     const struct pnfs_layoutdriver_type *,
 			     const struct nfs_client *,
diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
index f830616..7997899 100644
--- a/fs/nfs/pnfs_dev.c
+++ b/fs/nfs/pnfs_dev.c
@@ -67,14 +67,16 @@ nfs4_deviceid_hash(const struct nfs4_deviceid *id)
 }
 
 static struct nfs4_deviceid_node *
-_lookup_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id,
+_lookup_deviceid(const struct pnfs_layoutdriver_type *ld,
+		 const struct nfs_client *clp, const struct nfs4_deviceid *id,
 		 long hash)
 {
 	struct nfs4_deviceid_node *d;
 	struct hlist_node *n;
 
 	hlist_for_each_entry_rcu(d, n, &nfs4_deviceid_cache[hash], node)
-		if (d->nfs_client == clp && !memcmp(&d->deviceid, id, sizeof(*id))) {
+		if (d->ld == ld && d->nfs_client == clp &&
+		    !memcmp(&d->deviceid, id, sizeof(*id))) {
 			if (atomic_read(&d->ref))
 				return d;
 			else
@@ -90,13 +92,14 @@ _lookup_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id,
  * @id deviceid to look up
  */
 struct nfs4_deviceid_node *
-_find_get_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id,
+_find_get_deviceid(const struct pnfs_layoutdriver_type *ld,
+		   const struct nfs_client *clp, const struct nfs4_deviceid *id,
 		   long hash)
 {
 	struct nfs4_deviceid_node *d;
 
 	rcu_read_lock();
-	d = _lookup_deviceid(clp, id, hash);
+	d = _lookup_deviceid(ld, clp, id, hash);
 	if (!atomic_inc_not_zero(&d->ref))
 		d = NULL;
 	rcu_read_unlock();
@@ -104,9 +107,10 @@ _find_get_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id,
 }
 
 struct nfs4_deviceid_node *
-nfs4_find_get_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id)
+nfs4_find_get_deviceid(const struct pnfs_layoutdriver_type *ld,
+		       const struct nfs_client *clp, const struct nfs4_deviceid *id)
 {
-	return _find_get_deviceid(clp, id, nfs4_deviceid_hash(id));
+	return _find_get_deviceid(ld, clp, id, nfs4_deviceid_hash(id));
 }
 EXPORT_SYMBOL_GPL(nfs4_find_get_deviceid);
 
@@ -119,13 +123,14 @@ EXPORT_SYMBOL_GPL(nfs4_find_get_deviceid);
  * @ret the unhashed node, if found and dereferenced to zero, NULL otherwise.
  */
 struct nfs4_deviceid_node *
-nfs4_unhash_put_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id)
+nfs4_unhash_put_deviceid(const struct pnfs_layoutdriver_type *ld,
+			 const struct nfs_client *clp, const struct nfs4_deviceid *id)
 {
 	struct nfs4_deviceid_node *d;
 
 	spin_lock(&nfs4_deviceid_lock);
 	rcu_read_lock();
-	d = _lookup_deviceid(clp, id, nfs4_deviceid_hash(id));
+	d = _lookup_deviceid(ld, clp, id, nfs4_deviceid_hash(id));
 	rcu_read_unlock();
 	if (!d) {
 		spin_unlock(&nfs4_deviceid_lock);
@@ -150,11 +155,12 @@ EXPORT_SYMBOL_GPL(nfs4_unhash_put_deviceid);
  * @id deviceid to delete
  */
 void
-nfs4_delete_deviceid(const struct nfs_client *clp, const struct nfs4_deviceid *id)
+nfs4_delete_deviceid(const struct pnfs_layoutdriver_type *ld,
+		     const struct nfs_client *clp, const struct nfs4_deviceid *id)
 {
 	struct nfs4_deviceid_node *d;
 
-	d = nfs4_unhash_put_deviceid(clp, id);
+	d = nfs4_unhash_put_deviceid(ld, clp, id);
 	if (!d)
 		return;
 	if (d->ld->free_deviceid_node)
@@ -197,7 +203,7 @@ nfs4_insert_deviceid_node(struct nfs4_deviceid_node *new)
 
 	spin_lock(&nfs4_deviceid_lock);
 	hash = nfs4_deviceid_hash(&new->deviceid);
-	d = _find_get_deviceid(new->nfs_client, &new->deviceid, hash);
+	d = _find_get_deviceid(new->ld, new->nfs_client, &new->deviceid, hash);
 	if (d) {
 		spin_unlock(&nfs4_deviceid_lock);
 		return d;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 27/38] pnfs: alloc and free layout_hdr layoutdriver methods
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (25 preceding siblings ...)
  2011-05-22 23:59 ` [PATCH v5 26/38] NFSv4.1: use layout driver in global device cache Benny Halevy
@ 2011-05-23  0:00 ` Benny Halevy
  2011-05-23  0:00 ` [PATCH v5 28/38] pnfs-obj: define per-inode private structure Benny Halevy
                   ` (11 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-23  0:00 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

[gfp_flags]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/pnfs.c |   21 ++++++++++++++++++---
 fs/nfs/pnfs.h |    4 ++++
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 20436a5..ef535f2 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -177,13 +177,28 @@ get_layout_hdr(struct pnfs_layout_hdr *lo)
 	atomic_inc(&lo->plh_refcount);
 }
 
+static struct pnfs_layout_hdr *
+pnfs_alloc_layout_hdr(struct inode *ino, gfp_t gfp_flags)
+{
+	struct pnfs_layoutdriver_type *ld = NFS_SERVER(ino)->pnfs_curr_ld;
+	return ld->alloc_layout_hdr ? ld->alloc_layout_hdr(ino, gfp_flags) :
+		kzalloc(sizeof(struct pnfs_layout_hdr), gfp_flags);
+}
+
+static void
+pnfs_free_layout_hdr(struct pnfs_layout_hdr *lo)
+{
+	struct pnfs_layoutdriver_type *ld = NFS_SERVER(lo->plh_inode)->pnfs_curr_ld;
+	return ld->alloc_layout_hdr ? ld->free_layout_hdr(lo) : kfree(lo);
+}
+
 static void
 destroy_layout_hdr(struct pnfs_layout_hdr *lo)
 {
 	dprintk("%s: freeing layout cache %p\n", __func__, lo);
 	BUG_ON(!list_empty(&lo->plh_layouts));
 	NFS_I(lo->plh_inode)->layout = NULL;
-	kfree(lo);
+	pnfs_free_layout_hdr(lo);
 }
 
 static void
@@ -744,7 +759,7 @@ alloc_init_layout_hdr(struct inode *ino, gfp_t gfp_flags)
 {
 	struct pnfs_layout_hdr *lo;
 
-	lo = kzalloc(sizeof(struct pnfs_layout_hdr), gfp_flags);
+	lo = pnfs_alloc_layout_hdr(ino, gfp_flags);
 	if (!lo)
 		return NULL;
 	atomic_set(&lo->plh_refcount, 1);
@@ -777,7 +792,7 @@ pnfs_find_alloc_layout(struct inode *ino, gfp_t gfp_flags)
 	if (likely(nfsi->layout == NULL))	/* Won the race? */
 		nfsi->layout = new;
 	else
-		kfree(new);
+		pnfs_free_layout_hdr(new);
 	return nfsi->layout;
 }
 
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 2e94ed3..ed167a7 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -73,6 +73,10 @@ struct pnfs_layoutdriver_type {
 	const u32 id;
 	const char *name;
 	struct module *owner;
+
+	struct pnfs_layout_hdr * (*alloc_layout_hdr) (struct inode *inode, gfp_t gfp_flags);
+	void (*free_layout_hdr) (struct pnfs_layout_hdr *);
+
 	struct pnfs_layout_segment * (*alloc_lseg) (struct pnfs_layout_hdr *layoutid, struct nfs4_layoutget_res *lgr, gfp_t gfp_flags);
 	void (*free_lseg) (struct pnfs_layout_segment *lseg);
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 28/38] pnfs-obj: define per-inode private structure
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (26 preceding siblings ...)
  2011-05-23  0:00 ` [PATCH v5 27/38] pnfs: alloc and free layout_hdr layoutdriver methods Benny Halevy
@ 2011-05-23  0:00 ` Benny Halevy
  2011-05-23  0:00 ` [PATCH v5 29/38] pnfs: support for non-rpc layout drivers Benny Halevy
                   ` (10 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-23  0:00 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

allocate and deallocate per-inode private pnfs_layout_hdr
in preparation for I/O implementation.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/objlayout/objio_osd.c |    3 +++
 fs/nfs/objlayout/objlayout.c |   26 ++++++++++++++++++++++++++
 fs/nfs/objlayout/objlayout.h |   17 +++++++++++++++++
 3 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index a4201d8..a71f58e 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -352,6 +352,9 @@ static struct pnfs_layoutdriver_type objlayout_type = {
 	.id = LAYOUT_OSD2_OBJECTS,
 	.name = "LAYOUT_OSD2_OBJECTS",
 
+	.alloc_layout_hdr        = objlayout_alloc_layout_hdr,
+	.free_layout_hdr         = objlayout_free_layout_hdr,
+
 	.alloc_lseg              = objlayout_alloc_lseg,
 	.free_lseg               = objlayout_free_lseg,
 
diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c
index 10e5fca..f14b4da 100644
--- a/fs/nfs/objlayout/objlayout.c
+++ b/fs/nfs/objlayout/objlayout.c
@@ -42,6 +42,32 @@
 
 #define NFSDBG_FACILITY         NFSDBG_PNFS_LD
 /*
+ * Create a objlayout layout structure for the given inode and return it.
+ */
+struct pnfs_layout_hdr *
+objlayout_alloc_layout_hdr(struct inode *inode, gfp_t gfp_flags)
+{
+	struct objlayout *objlay;
+
+	objlay = kzalloc(sizeof(struct objlayout), gfp_flags);
+	dprintk("%s: Return %p\n", __func__, objlay);
+	return &objlay->pnfs_layout;
+}
+
+/*
+ * Free an objlayout layout structure
+ */
+void
+objlayout_free_layout_hdr(struct pnfs_layout_hdr *lo)
+{
+	struct objlayout *objlay = OBJLAYOUT(lo);
+
+	dprintk("%s: objlay %p\n", __func__, objlay);
+
+	kfree(objlay);
+}
+
+/*
  * Unmarshall layout and store it in pnfslay.
  */
 struct pnfs_layout_segment *
diff --git a/fs/nfs/objlayout/objlayout.h b/fs/nfs/objlayout/objlayout.h
index 0814271..fa02621 100644
--- a/fs/nfs/objlayout/objlayout.h
+++ b/fs/nfs/objlayout/objlayout.h
@@ -46,6 +46,19 @@
 #include "../pnfs.h"
 
 /*
+ * per-inode layout
+ */
+struct objlayout {
+	struct pnfs_layout_hdr pnfs_layout;
+};
+
+static inline struct objlayout *
+OBJLAYOUT(struct pnfs_layout_hdr *lo)
+{
+	return container_of(lo, struct objlayout, pnfs_layout);
+}
+
+/*
  * Raid engine I/O API
  */
 extern int objio_alloc_lseg(struct pnfs_layout_segment **outp,
@@ -66,6 +79,10 @@ extern void objlayout_put_deviceinfo(struct pnfs_osd_deviceaddr *deviceaddr);
 /*
  * exported generic objects function vectors
  */
+
+extern struct pnfs_layout_hdr *objlayout_alloc_layout_hdr(struct inode *, gfp_t gfp_flags);
+extern void objlayout_free_layout_hdr(struct pnfs_layout_hdr *);
+
 extern struct pnfs_layout_segment *objlayout_alloc_lseg(
 	struct pnfs_layout_hdr *,
 	struct nfs4_layoutget_res *,
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 29/38] pnfs: support for non-rpc layout drivers
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (27 preceding siblings ...)
  2011-05-23  0:00 ` [PATCH v5 28/38] pnfs-obj: define per-inode private structure Benny Halevy
@ 2011-05-23  0:00 ` Benny Halevy
  2011-05-23  0:01 ` [PATCH v5 30/38] SQUASHME: initialize data->task on the non-rpc io done success paths Benny Halevy
                   ` (9 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-23  0:00 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

Non-rpc layout driver such as for objects and blocks
implement their own I/O path and error handling logic.
Therefore bypass NFS-based error handling for these layout drivers.

[fix lseg ref-count bugs, and null de-refs]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[get rid of PNFS_USE_RPC_CODE]
[get rid of __nfs4_write_done_cb]
[revert useless change in nfs4_write_done_cb]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/internal.h       |    1 +
 fs/nfs/nfs4proc.c       |   13 +++++++++--
 fs/nfs/pnfs.c           |   52 ++++++++++++++++++++++++++++++++++++++++++++++-
 fs/nfs/pnfs.h           |    2 +
 include/linux/nfs_xdr.h |    2 +
 5 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index ce118ce..bcf0f0f 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -310,6 +310,7 @@ extern int nfs_migrate_page(struct address_space *,
 #endif
 
 /* nfs4proc.c */
+extern void __nfs4_read_done_cb(struct nfs_read_data *);
 extern void nfs4_reset_read(struct rpc_task *task, struct nfs_read_data *data);
 extern int nfs4_init_client(struct nfs_client *clp,
 			    const struct rpc_timeout *timeparms,
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index cf1b339..92c8bc4 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3175,6 +3175,11 @@ static int nfs4_proc_pathconf(struct nfs_server *server, struct nfs_fh *fhandle,
 	return err;
 }
 
+void __nfs4_read_done_cb(struct nfs_read_data *data)
+{
+	nfs_invalidate_atime(data->inode);
+}
+
 static int nfs4_read_done_cb(struct rpc_task *task, struct nfs_read_data *data)
 {
 	struct nfs_server *server = NFS_SERVER(data->inode);
@@ -3184,7 +3189,7 @@ static int nfs4_read_done_cb(struct rpc_task *task, struct nfs_read_data *data)
 		return -EAGAIN;
 	}
 
-	nfs_invalidate_atime(data->inode);
+	__nfs4_read_done_cb(data);
 	if (task->tk_status > 0)
 		renew_lease(server, data->timestamp);
 	return 0;
@@ -3198,7 +3203,8 @@ static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
 	if (!nfs4_sequence_done(task, &data->res.seq_res))
 		return -EAGAIN;
 
-	return data->read_done_cb(task, data);
+	return data->read_done_cb ? data->read_done_cb(task, data) :
+				    nfs4_read_done_cb(task, data);
 }
 
 static void nfs4_proc_read_setup(struct nfs_read_data *data, struct rpc_message *msg)
@@ -3243,7 +3249,8 @@ static int nfs4_write_done(struct rpc_task *task, struct nfs_write_data *data)
 {
 	if (!nfs4_sequence_done(task, &data->res.seq_res))
 		return -EAGAIN;
-	return data->write_done_cb(task, data);
+	return data->write_done_cb ? data->write_done_cb(task, data) :
+		nfs4_write_done_cb(task, data);
 }
 
 /* Reset the the nfs_write_data to send the write to the MDS. */
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index ef535f2..0f59802 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -243,7 +243,7 @@ put_lseg_common(struct pnfs_layout_segment *lseg)
 {
 	struct inode *inode = lseg->pls_layout->plh_inode;
 
-	BUG_ON(test_bit(NFS_LSEG_VALID, &lseg->pls_flags));
+	WARN_ON(test_bit(NFS_LSEG_VALID, &lseg->pls_flags));
 	list_del_init(&lseg->pls_list);
 	if (list_empty(&lseg->pls_layout->plh_segs)) {
 		set_bit(NFS_LAYOUT_DESTROYED, &lseg->pls_layout->plh_flags);
@@ -1054,6 +1054,31 @@ pnfs_pageio_init_write(struct nfs_pageio_descriptor *pgio, struct inode *inode)
 	pgio->pg_test = (ld && ld->pg_test) ? pnfs_write_pg_test : NULL;
 }
 
+/*
+ * Called by non rpc-based layout drivers
+ */
+int
+pnfs_ld_write_done(struct nfs_write_data *data)
+{
+	int status;
+
+	if (!data->pnfs_error) {
+		pnfs_set_layoutcommit(data);
+		data->mds_ops->rpc_call_done(&data->task, data);
+		data->mds_ops->rpc_release(data);
+		return 0;
+	}
+
+	put_lseg(data->lseg);
+	data->lseg = NULL;
+	dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__,
+		data->pnfs_error);
+	status = nfs_initiate_write(data, NFS_CLIENT(data->inode),
+				    data->mds_ops, NFS_FILE_SYNC);
+	return status ? : -EAGAIN;
+}
+EXPORT_SYMBOL_GPL(pnfs_ld_write_done);
+
 enum pnfs_try_status
 pnfs_try_to_write_data(struct nfs_write_data *wdata,
 			const struct rpc_call_ops *call_ops, int how)
@@ -1079,6 +1104,31 @@ pnfs_try_to_write_data(struct nfs_write_data *wdata,
 }
 
 /*
+ * Called by non rpc-based layout drivers
+ */
+int
+pnfs_ld_read_done(struct nfs_read_data *data)
+{
+	int status;
+
+	if (!data->pnfs_error) {
+		__nfs4_read_done_cb(data);
+		data->mds_ops->rpc_call_done(&data->task, data);
+		data->mds_ops->rpc_release(data);
+		return 0;
+	}
+
+	put_lseg(data->lseg);
+	data->lseg = NULL;
+	dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__,
+		data->pnfs_error);
+	status = nfs_initiate_read(data, NFS_CLIENT(data->inode),
+				   data->mds_ops);
+	return status ? : -EAGAIN;
+}
+EXPORT_SYMBOL_GPL(pnfs_ld_read_done);
+
+/*
  * Call the appropriate parallel I/O subsystem read function.
  */
 enum pnfs_try_status
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index ed167a7..8a6e1f1 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -165,6 +165,8 @@ void pnfs_roc_set_barrier(struct inode *ino, u32 barrier);
 bool pnfs_roc_drain(struct inode *ino, u32 *barrier);
 void pnfs_set_layoutcommit(struct nfs_write_data *wdata);
 int pnfs_layoutcommit_inode(struct inode *inode, bool sync);
+int pnfs_ld_write_done(struct nfs_write_data *);
+int pnfs_ld_read_done(struct nfs_read_data *);
 
 /* pnfs_dev.c */
 struct nfs4_deviceid_node {
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 7e371f7..7c8ff09 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1087,6 +1087,7 @@ struct nfs_read_data {
 	const struct rpc_call_ops *mds_ops;
 	int (*read_done_cb) (struct rpc_task *task, struct nfs_read_data *data);
 	__u64			mds_offset;
+	int			pnfs_error;
 	struct page		*page_array[NFS_PAGEVEC_SIZE];
 };
 
@@ -1112,6 +1113,7 @@ struct nfs_write_data {
 	unsigned long		timestamp;	/* For lease renewal */
 #endif
 	__u64			mds_offset;	/* Filelayout dense stripe */
+	int			pnfs_error;
 	struct page		*page_array[NFS_PAGEVEC_SIZE];
 };
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 30/38] SQUASHME: initialize data->task on the non-rpc io done success paths
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (28 preceding siblings ...)
  2011-05-23  0:00 ` [PATCH v5 29/38] pnfs: support for non-rpc layout drivers Benny Halevy
@ 2011-05-23  0:01 ` Benny Halevy
  2011-05-23  4:58   ` Boaz Harrosh
  2011-05-23  0:01 ` [PATCH v5 31/38] pnfs-obj: osd raid engine read/write implementation Benny Halevy
                   ` (8 subsequent siblings)
  38 siblings, 1 reply; 51+ messages in thread
From: Benny Halevy @ 2011-05-23  0:01 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/pnfs.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 0f59802..d39fcca 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1064,6 +1064,7 @@ pnfs_ld_write_done(struct nfs_write_data *data)
 
 	if (!data->pnfs_error) {
 		pnfs_set_layoutcommit(data);
+		memset(&data->task, 0, sizeof(data->task));
 		data->mds_ops->rpc_call_done(&data->task, data);
 		data->mds_ops->rpc_release(data);
 		return 0;
@@ -1113,6 +1114,7 @@ pnfs_ld_read_done(struct nfs_read_data *data)
 
 	if (!data->pnfs_error) {
 		__nfs4_read_done_cb(data);
+		memset(&data->task, 0, sizeof(data->task));
 		data->mds_ops->rpc_call_done(&data->task, data);
 		data->mds_ops->rpc_release(data);
 		return 0;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 31/38] pnfs-obj: osd raid engine read/write implementation
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (29 preceding siblings ...)
  2011-05-23  0:01 ` [PATCH v5 30/38] SQUASHME: initialize data->task on the non-rpc io done success paths Benny Halevy
@ 2011-05-23  0:01 ` Benny Halevy
  2011-05-23 10:44   ` [PTACH] SQUASHME: pnfs-obj: Important fallout from the last rebase Boaz Harrosh
  2011-05-23  0:01 ` [PATCH v5 32/38] pnfs: layoutreturn Benny Halevy
                   ` (7 subsequent siblings)
  38 siblings, 1 reply; 51+ messages in thread
From: Benny Halevy @ 2011-05-23  0:01 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

From: Boaz Harrosh <bharrosh@panasas.com>

With the use of the in-kernel osd library. Implement read/write
of data from/to osd-objects according to information specified
in the objects-layout.

Support for stripping over mirrors with a received stripe_unit.
There are however a few constrains which are not supported:
 1. Stripe Unit must be a multiple of PAGE_SIZE
 2. stripe length (stripe_unit * number_of_stripes) can not be
    bigger then 32bit.

Also support raid-groups and partial-layout. Partial-layout is
when not all the groups are received on the line, addressing
only a partial range of the file.

TODO:
  Only raid0! raid 4/5/6 support will come at later stage

A none supported layout will send IO through the MDS

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[gfp_flags]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/objlayout/objio_osd.c |  589 ++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/objlayout/objlayout.c |  254 ++++++++++++++++++
 fs/nfs/objlayout/objlayout.h |   42 +++
 3 files changed, 885 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index a71f58e..55fa5fb 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -46,6 +46,10 @@
 
 #define _LLU(x) ((unsigned long long)x)
 
+enum { BIO_MAX_PAGES_KMALLOC =
+		(PAGE_SIZE - sizeof(struct bio)) / sizeof(struct bio_vec),
+};
+
 struct objio_dev_ent {
 	struct nfs4_deviceid_node id_node;
 	struct osd_dev *od;
@@ -126,6 +130,31 @@ OBJIO_LSEG(struct pnfs_layout_segment *lseg)
 	return container_of(lseg, struct objio_segment, lseg);
 }
 
+struct objio_state;
+typedef ssize_t (*objio_done_fn)(struct objio_state *ios);
+
+struct objio_state {
+	/* Generic layer */
+	struct objlayout_io_state ol_state;
+
+	struct objio_segment *layout;
+
+	struct kref kref;
+	objio_done_fn done;
+	void *private;
+
+	unsigned long length;
+	unsigned numdevs; /* Actually used devs in this IO */
+	/* A per-device variable array of size numdevs */
+	struct _objio_per_comp {
+		struct bio *bio;
+		struct osd_request *or;
+		unsigned long length;
+		u64 offset;
+		unsigned dev;
+	} per_dev[];
+};
+
 /* Send and wait for a get_device_info of devices in the layout,
    then look them up with the osd_initiator library */
 static struct objio_dev_ent *_device_lookup(struct pnfs_layout_hdr *pnfslay,
@@ -347,6 +376,563 @@ void objio_free_lseg(struct pnfs_layout_segment *lseg)
 	kfree(objio_seg);
 }
 
+int objio_alloc_io_state(struct pnfs_layout_segment *lseg,
+			 struct objlayout_io_state **outp,
+			 gfp_t gfp_flags)
+{
+	struct objio_segment *objio_seg = OBJIO_LSEG(lseg);
+	struct objio_state *ios;
+	const unsigned first_size = sizeof(*ios) +
+				objio_seg->num_comps * sizeof(ios->per_dev[0]);
+
+	dprintk("%s: num_comps=%d\n", __func__, objio_seg->num_comps);
+	ios = kzalloc(first_size, gfp_flags);
+	if (unlikely(!ios))
+		return -ENOMEM;
+
+	ios->layout = objio_seg;
+
+	*outp = &ios->ol_state;
+	return 0;
+}
+
+void objio_free_io_state(struct objlayout_io_state *ol_state)
+{
+	struct objio_state *ios = container_of(ol_state, struct objio_state,
+					       ol_state);
+
+	kfree(ios);
+}
+
+static void _clear_bio(struct bio *bio)
+{
+	struct bio_vec *bv;
+	unsigned i;
+
+	__bio_for_each_segment(bv, bio, i, 0) {
+		unsigned this_count = bv->bv_len;
+
+		if (likely(PAGE_SIZE == this_count))
+			clear_highpage(bv->bv_page);
+		else
+			zero_user(bv->bv_page, bv->bv_offset, this_count);
+	}
+}
+
+static int _io_check(struct objio_state *ios, bool is_write)
+{
+	enum osd_err_priority oep = OSD_ERR_PRI_NO_ERROR;
+	int lin_ret = 0;
+	int i;
+
+	for (i = 0; i <  ios->numdevs; i++) {
+		struct osd_sense_info osi;
+		struct osd_request *or = ios->per_dev[i].or;
+		unsigned dev;
+		int ret;
+
+		if (!or)
+			continue;
+
+		ret = osd_req_decode_sense(or, &osi);
+		if (likely(!ret))
+			continue;
+
+		if (OSD_ERR_PRI_CLEAR_PAGES == osi.osd_err_pri) {
+			/* start read offset passed endof file */
+			BUG_ON(is_write);
+			_clear_bio(ios->per_dev[i].bio);
+			dprintk("%s: start read offset passed end of file "
+				"offset=0x%llx, length=0x%lx\n", __func__,
+				_LLU(ios->per_dev[i].offset),
+				ios->per_dev[i].length);
+
+			continue; /* we recovered */
+		}
+		dev = ios->per_dev[i].dev;
+
+		if (osi.osd_err_pri >= oep) {
+			oep = osi.osd_err_pri;
+			lin_ret = ret;
+		}
+	}
+
+	return lin_ret;
+}
+
+/*
+ * Common IO state helpers.
+ */
+static void _io_free(struct objio_state *ios)
+{
+	unsigned i;
+
+	for (i = 0; i < ios->numdevs; i++) {
+		struct _objio_per_comp *per_dev = &ios->per_dev[i];
+
+		if (per_dev->or) {
+			osd_end_request(per_dev->or);
+			per_dev->or = NULL;
+		}
+
+		if (per_dev->bio) {
+			bio_put(per_dev->bio);
+			per_dev->bio = NULL;
+		}
+	}
+}
+
+struct osd_dev *_io_od(struct objio_state *ios, unsigned dev)
+{
+	unsigned min_dev = ios->layout->comps_index;
+	unsigned max_dev = min_dev + ios->layout->num_comps;
+
+	BUG_ON(dev < min_dev || max_dev <= dev);
+	return ios->layout->ods[dev - min_dev]->od;
+}
+
+struct _striping_info {
+	u64 obj_offset;
+	u64 group_length;
+	unsigned dev;
+	unsigned unit_off;
+};
+
+static void _calc_stripe_info(struct objio_state *ios, u64 file_offset,
+			      struct _striping_info *si)
+{
+	u32	stripe_unit = ios->layout->stripe_unit;
+	u32	group_width = ios->layout->group_width;
+	u64	group_depth = ios->layout->group_depth;
+	u32	U = stripe_unit * group_width;
+
+	u64	T = U * group_depth;
+	u64	S = T * ios->layout->group_count;
+	u64	M = div64_u64(file_offset, S);
+
+	/*
+	G = (L - (M * S)) / T
+	H = (L - (M * S)) % T
+	*/
+	u64	LmodU = file_offset - M * S;
+	u32	G = div64_u64(LmodU, T);
+	u64	H = LmodU - G * T;
+
+	u32	N = div_u64(H, U);
+
+	div_u64_rem(file_offset, stripe_unit, &si->unit_off);
+	si->obj_offset = si->unit_off + (N * stripe_unit) +
+				  (M * group_depth * stripe_unit);
+
+	/* "H - (N * U)" is just "H % U" so it's bound to u32 */
+	si->dev = (u32)(H - (N * U)) / stripe_unit + G * group_width;
+	si->dev *= ios->layout->mirrors_p1;
+
+	si->group_length = T - H;
+}
+
+static int _add_stripe_unit(struct objio_state *ios,  unsigned *cur_pg,
+		unsigned pgbase, struct _objio_per_comp *per_dev, int cur_len,
+		gfp_t gfp_flags)
+{
+	unsigned pg = *cur_pg;
+	struct request_queue *q =
+			osd_request_queue(_io_od(ios, per_dev->dev));
+
+	per_dev->length += cur_len;
+
+	if (per_dev->bio == NULL) {
+		unsigned stripes = ios->layout->num_comps /
+						     ios->layout->mirrors_p1;
+		unsigned pages_in_stripe = stripes *
+				      (ios->layout->stripe_unit / PAGE_SIZE);
+		unsigned bio_size = (ios->ol_state.nr_pages + pages_in_stripe) /
+				    stripes;
+
+		per_dev->bio = bio_kmalloc(gfp_flags, bio_size);
+		if (unlikely(!per_dev->bio)) {
+			dprintk("Faild to allocate BIO size=%u\n", bio_size);
+			return -ENOMEM;
+		}
+	}
+
+	while (cur_len > 0) {
+		unsigned pglen = min_t(unsigned, PAGE_SIZE - pgbase, cur_len);
+		unsigned added_len;
+
+		BUG_ON(ios->ol_state.nr_pages <= pg);
+		cur_len -= pglen;
+
+		added_len = bio_add_pc_page(q, per_dev->bio,
+					ios->ol_state.pages[pg], pglen, pgbase);
+		if (unlikely(pglen != added_len))
+			return -ENOMEM;
+		pgbase = 0;
+		++pg;
+	}
+	BUG_ON(cur_len);
+
+	*cur_pg = pg;
+	return 0;
+}
+
+static int _prepare_one_group(struct objio_state *ios, u64 length,
+			      struct _striping_info *si, unsigned *last_pg,
+			      gfp_t gfp_flags)
+{
+	unsigned stripe_unit = ios->layout->stripe_unit;
+	unsigned mirrors_p1 = ios->layout->mirrors_p1;
+	unsigned devs_in_group = ios->layout->group_width * mirrors_p1;
+	unsigned dev = si->dev;
+	unsigned first_dev = dev - (dev % devs_in_group);
+	unsigned max_comp = ios->numdevs ? ios->numdevs - mirrors_p1 : 0;
+	unsigned cur_pg = *last_pg;
+	int ret = 0;
+
+	while (length) {
+		struct _objio_per_comp *per_dev = &ios->per_dev[dev];
+		unsigned cur_len, page_off = 0;
+
+		if (!per_dev->length) {
+			per_dev->dev = dev;
+			if (dev < si->dev) {
+				per_dev->offset = si->obj_offset + stripe_unit -
+								   si->unit_off;
+				cur_len = stripe_unit;
+			} else if (dev == si->dev) {
+				per_dev->offset = si->obj_offset;
+				cur_len = stripe_unit - si->unit_off;
+				page_off = si->unit_off & ~PAGE_MASK;
+				BUG_ON(page_off &&
+				      (page_off != ios->ol_state.pgbase));
+			} else { /* dev > si->dev */
+				per_dev->offset = si->obj_offset - si->unit_off;
+				cur_len = stripe_unit;
+			}
+
+			if (max_comp < dev)
+				max_comp = dev;
+		} else {
+			cur_len = stripe_unit;
+		}
+		if (cur_len >= length)
+			cur_len = length;
+
+		ret = _add_stripe_unit(ios, &cur_pg, page_off , per_dev,
+				       cur_len, gfp_flags);
+		if (unlikely(ret))
+			goto out;
+
+		dev += mirrors_p1;
+		dev = (dev % devs_in_group) + first_dev;
+
+		length -= cur_len;
+		ios->length += cur_len;
+	}
+out:
+	ios->numdevs = max_comp + mirrors_p1;
+	*last_pg = cur_pg;
+	return ret;
+}
+
+static int _io_rw_pagelist(struct objio_state *ios, gfp_t gfp_flags)
+{
+	u64 length = ios->ol_state.count;
+	u64 offset = ios->ol_state.offset;
+	struct _striping_info si;
+	unsigned last_pg = 0;
+	int ret = 0;
+
+	while (length) {
+		_calc_stripe_info(ios, offset, &si);
+
+		if (length < si.group_length)
+			si.group_length = length;
+
+		ret = _prepare_one_group(ios, si.group_length, &si, &last_pg, gfp_flags);
+		if (unlikely(ret))
+			goto out;
+
+		offset += si.group_length;
+		length -= si.group_length;
+	}
+
+out:
+	if (!ios->length)
+		return ret;
+
+	return 0;
+}
+
+static ssize_t _sync_done(struct objio_state *ios)
+{
+	struct completion *waiting = ios->private;
+
+	complete(waiting);
+	return 0;
+}
+
+static void _last_io(struct kref *kref)
+{
+	struct objio_state *ios = container_of(kref, struct objio_state, kref);
+
+	ios->done(ios);
+}
+
+static void _done_io(struct osd_request *or, void *p)
+{
+	struct objio_state *ios = p;
+
+	kref_put(&ios->kref, _last_io);
+}
+
+static ssize_t _io_exec(struct objio_state *ios)
+{
+	DECLARE_COMPLETION_ONSTACK(wait);
+	ssize_t status = 0; /* sync status */
+	unsigned i;
+	objio_done_fn saved_done_fn = ios->done;
+	bool sync = ios->ol_state.sync;
+
+	if (sync) {
+		ios->done = _sync_done;
+		ios->private = &wait;
+	}
+
+	kref_init(&ios->kref);
+
+	for (i = 0; i < ios->numdevs; i++) {
+		struct osd_request *or = ios->per_dev[i].or;
+
+		if (!or)
+			continue;
+
+		kref_get(&ios->kref);
+		osd_execute_request_async(or, _done_io, ios);
+	}
+
+	kref_put(&ios->kref, _last_io);
+
+	if (sync) {
+		wait_for_completion(&wait);
+		status = saved_done_fn(ios);
+	}
+
+	return status;
+}
+
+/*
+ * read
+ */
+static ssize_t _read_done(struct objio_state *ios)
+{
+	ssize_t status;
+	int ret = _io_check(ios, false);
+
+	_io_free(ios);
+
+	if (likely(!ret))
+		status = ios->length;
+	else
+		status = ret;
+
+	objlayout_read_done(&ios->ol_state, status, ios->ol_state.sync);
+	return status;
+}
+
+static int _read_mirrors(struct objio_state *ios, unsigned cur_comp)
+{
+	struct osd_request *or = NULL;
+	struct _objio_per_comp *per_dev = &ios->per_dev[cur_comp];
+	unsigned dev = per_dev->dev;
+	struct pnfs_osd_object_cred *cred =
+			&ios->layout->comps[dev];
+	struct osd_obj_id obj = {
+		.partition = cred->oc_object_id.oid_partition_id,
+		.id = cred->oc_object_id.oid_object_id,
+	};
+	int ret;
+
+	or = osd_start_request(_io_od(ios, dev), GFP_KERNEL);
+	if (unlikely(!or)) {
+		ret = -ENOMEM;
+		goto err;
+	}
+	per_dev->or = or;
+
+	osd_req_read(or, &obj, per_dev->offset, per_dev->bio, per_dev->length);
+
+	ret = osd_finalize_request(or, 0, cred->oc_cap.cred, NULL);
+	if (ret) {
+		dprintk("%s: Faild to osd_finalize_request() => %d\n",
+			__func__, ret);
+		goto err;
+	}
+
+	dprintk("%s:[%d] dev=%d obj=0x%llx start=0x%llx length=0x%lx\n",
+		__func__, cur_comp, dev, obj.id, _LLU(per_dev->offset),
+		per_dev->length);
+
+err:
+	return ret;
+}
+
+static ssize_t _read_exec(struct objio_state *ios)
+{
+	unsigned i;
+	int ret;
+
+	for (i = 0; i < ios->numdevs; i += ios->layout->mirrors_p1) {
+		if (!ios->per_dev[i].length)
+			continue;
+		ret = _read_mirrors(ios, i);
+		if (unlikely(ret))
+			goto err;
+	}
+
+	ios->done = _read_done;
+	return _io_exec(ios); /* In sync mode exec returns the io status */
+
+err:
+	_io_free(ios);
+	return ret;
+}
+
+ssize_t objio_read_pagelist(struct objlayout_io_state *ol_state)
+{
+	struct objio_state *ios = container_of(ol_state, struct objio_state,
+					       ol_state);
+	int ret;
+
+	ret = _io_rw_pagelist(ios, GFP_KERNEL);
+	if (unlikely(ret))
+		return ret;
+
+	return _read_exec(ios);
+}
+
+/*
+ * write
+ */
+static ssize_t _write_done(struct objio_state *ios)
+{
+	ssize_t status;
+	int ret = _io_check(ios, true);
+
+	_io_free(ios);
+
+	if (likely(!ret)) {
+		/* FIXME: should be based on the OSD's persistence model
+		 * See OSD2r05 Section 4.13 Data persistence model */
+		ios->ol_state.committed = NFS_FILE_SYNC;
+		status = ios->length;
+	} else {
+		status = ret;
+	}
+
+	objlayout_write_done(&ios->ol_state, status, ios->ol_state.sync);
+	return status;
+}
+
+static int _write_mirrors(struct objio_state *ios, unsigned cur_comp)
+{
+	struct _objio_per_comp *master_dev = &ios->per_dev[cur_comp];
+	unsigned dev = ios->per_dev[cur_comp].dev;
+	unsigned last_comp = cur_comp + ios->layout->mirrors_p1;
+	int ret;
+
+	for (; cur_comp < last_comp; ++cur_comp, ++dev) {
+		struct osd_request *or = NULL;
+		struct pnfs_osd_object_cred *cred =
+					&ios->layout->comps[dev];
+		struct osd_obj_id obj = {
+			.partition = cred->oc_object_id.oid_partition_id,
+			.id = cred->oc_object_id.oid_object_id,
+		};
+		struct _objio_per_comp *per_dev = &ios->per_dev[cur_comp];
+		struct bio *bio;
+
+		or = osd_start_request(_io_od(ios, dev), GFP_NOFS);
+		if (unlikely(!or)) {
+			ret = -ENOMEM;
+			goto err;
+		}
+		per_dev->or = or;
+
+		if (per_dev != master_dev) {
+			bio = bio_kmalloc(GFP_NOFS,
+					  master_dev->bio->bi_max_vecs);
+			if (unlikely(!bio)) {
+				dprintk("Faild to allocate BIO size=%u\n",
+					master_dev->bio->bi_max_vecs);
+				ret = -ENOMEM;
+				goto err;
+			}
+
+			__bio_clone(bio, master_dev->bio);
+			bio->bi_bdev = NULL;
+			bio->bi_next = NULL;
+			per_dev->bio = bio;
+			per_dev->dev = dev;
+			per_dev->length = master_dev->length;
+			per_dev->offset =  master_dev->offset;
+		} else {
+			bio = master_dev->bio;
+			bio->bi_rw |= REQ_WRITE;
+		}
+
+		osd_req_write(or, &obj, per_dev->offset, bio, per_dev->length);
+
+		ret = osd_finalize_request(or, 0, cred->oc_cap.cred, NULL);
+		if (ret) {
+			dprintk("%s: Faild to osd_finalize_request() => %d\n",
+				__func__, ret);
+			goto err;
+		}
+
+		dprintk("%s:[%d] dev=%d obj=0x%llx start=0x%llx length=0x%lx\n",
+			__func__, cur_comp, dev, obj.id, _LLU(per_dev->offset),
+			per_dev->length);
+	}
+
+err:
+	return ret;
+}
+
+static ssize_t _write_exec(struct objio_state *ios)
+{
+	unsigned i;
+	int ret;
+
+	for (i = 0; i < ios->numdevs; i += ios->layout->mirrors_p1) {
+		if (!ios->per_dev[i].length)
+			continue;
+		ret = _write_mirrors(ios, i);
+		if (unlikely(ret))
+			goto err;
+	}
+
+	ios->done = _write_done;
+	return _io_exec(ios); /* In sync mode exec returns the io->status */
+
+err:
+	_io_free(ios);
+	return ret;
+}
+
+ssize_t objio_write_pagelist(struct objlayout_io_state *ol_state, bool stable)
+{
+	struct objio_state *ios = container_of(ol_state, struct objio_state,
+					       ol_state);
+	int ret;
+
+	/* TODO: ios->stable = stable; */
+	ret = _io_rw_pagelist(ios, GFP_NOFS);
+	if (unlikely(ret))
+		return ret;
+
+	return _write_exec(ios);
+}
 
 static struct pnfs_layoutdriver_type objlayout_type = {
 	.id = LAYOUT_OSD2_OBJECTS,
@@ -358,6 +944,9 @@ static struct pnfs_layoutdriver_type objlayout_type = {
 	.alloc_lseg              = objlayout_alloc_lseg,
 	.free_lseg               = objlayout_free_lseg,
 
+	.read_pagelist           = objlayout_read_pagelist,
+	.write_pagelist          = objlayout_write_pagelist,
+
 	.free_deviceid_node	 = objio_free_deviceid_node,
 };
 
diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c
index f14b4da..5157ef6 100644
--- a/fs/nfs/objlayout/objlayout.c
+++ b/fs/nfs/objlayout/objlayout.c
@@ -129,6 +129,260 @@ objlayout_free_lseg(struct pnfs_layout_segment *lseg)
 }
 
 /*
+ * I/O Operations
+ */
+static inline u64
+end_offset(u64 start, u64 len)
+{
+	u64 end;
+
+	end = start + len;
+	return end >= start ? end : NFS4_MAX_UINT64;
+}
+
+/* last octet in a range */
+static inline u64
+last_byte_offset(u64 start, u64 len)
+{
+	u64 end;
+
+	BUG_ON(!len);
+	end = start + len;
+	return end > start ? end - 1 : NFS4_MAX_UINT64;
+}
+
+static struct objlayout_io_state *
+objlayout_alloc_io_state(struct pnfs_layout_hdr *pnfs_layout_type,
+			struct page **pages,
+			unsigned pgbase,
+			loff_t offset,
+			size_t count,
+			struct pnfs_layout_segment *lseg,
+			void *rpcdata,
+			gfp_t gfp_flags)
+{
+	struct objlayout_io_state *state;
+	u64 lseg_end_offset;
+
+	dprintk("%s: allocating io_state\n", __func__);
+	if (objio_alloc_io_state(lseg, &state, gfp_flags))
+		return NULL;
+
+	BUG_ON(offset < lseg->pls_range.offset);
+	lseg_end_offset = end_offset(lseg->pls_range.offset,
+				     lseg->pls_range.length);
+	BUG_ON(offset >= lseg_end_offset);
+	if (offset + count > lseg_end_offset) {
+		count = lseg->pls_range.length -
+				(offset - lseg->pls_range.offset);
+		dprintk("%s: truncated count %Zd\n", __func__, count);
+	}
+
+	if (pgbase > PAGE_SIZE) {
+		pages += pgbase >> PAGE_SHIFT;
+		pgbase &= ~PAGE_MASK;
+	}
+
+	state->lseg = lseg;
+	state->rpcdata = rpcdata;
+	state->pages = pages;
+	state->pgbase = pgbase;
+	state->nr_pages = (pgbase + count + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	state->offset = offset;
+	state->count = count;
+	state->sync = 0;
+
+	return state;
+}
+
+static void
+objlayout_free_io_state(struct objlayout_io_state *state)
+{
+	dprintk("%s: freeing io_state\n", __func__);
+	if (unlikely(!state))
+		return;
+
+	objio_free_io_state(state);
+}
+
+/*
+ * I/O done common code
+ */
+static void
+objlayout_iodone(struct objlayout_io_state *state)
+{
+	dprintk("%s: state %p status\n", __func__, state);
+
+	objlayout_free_io_state(state);
+}
+
+/* Function scheduled on rpc workqueue to call ->nfs_readlist_complete().
+ * This is because the osd completion is called with ints-off from
+ * the block layer
+ */
+static void _rpc_read_complete(struct work_struct *work)
+{
+	struct rpc_task *task;
+	struct nfs_read_data *rdata;
+
+	dprintk("%s enter\n", __func__);
+	task = container_of(work, struct rpc_task, u.tk_work);
+	rdata = container_of(task, struct nfs_read_data, task);
+
+	pnfs_ld_read_done(rdata);
+}
+
+void
+objlayout_read_done(struct objlayout_io_state *state, ssize_t status, bool sync)
+{
+	int eof = state->eof;
+	struct nfs_read_data *rdata;
+
+	state->status = status;
+	dprintk("%s: Begin status=%ld eof=%d\n", __func__, status, eof);
+	rdata = state->rpcdata;
+	rdata->task.tk_status = status;
+	if (status >= 0) {
+		rdata->res.count = status;
+		rdata->res.eof = eof;
+	}
+	objlayout_iodone(state);
+	/* must not use state after this point */
+
+	if (sync)
+		pnfs_ld_read_done(rdata);
+	else {
+		INIT_WORK(&rdata->task.u.tk_work, _rpc_read_complete);
+		schedule_work(&rdata->task.u.tk_work);
+	}
+}
+
+/*
+ * Perform sync or async reads.
+ */
+enum pnfs_try_status
+objlayout_read_pagelist(struct nfs_read_data *rdata)
+{
+	loff_t offset = rdata->args.offset;
+	size_t count = rdata->args.count;
+	struct objlayout_io_state *state;
+	ssize_t status = 0;
+	loff_t eof;
+
+	dprintk("%s: Begin inode %p offset %llu count %d\n",
+		__func__, rdata->inode, offset, (int)count);
+
+	eof = i_size_read(rdata->inode);
+	if (unlikely(offset + count > eof)) {
+		if (offset >= eof) {
+			status = 0;
+			rdata->res.count = 0;
+			rdata->res.eof = 1;
+			goto out;
+		}
+		count = eof - offset;
+	}
+
+	state = objlayout_alloc_io_state(NFS_I(rdata->inode)->layout,
+					 rdata->args.pages, rdata->args.pgbase,
+					 offset, count,
+					 rdata->lseg, rdata,
+					 GFP_KERNEL);
+	if (unlikely(!state)) {
+		status = -ENOMEM;
+		goto out;
+	}
+
+	state->eof = state->offset + state->count >= eof;
+
+	status = objio_read_pagelist(state);
+ out:
+	dprintk("%s: Return status %Zd\n", __func__, status);
+	rdata->pnfs_error = status;
+	return PNFS_ATTEMPTED;
+}
+
+/* Function scheduled on rpc workqueue to call ->nfs_writelist_complete().
+ * This is because the osd completion is called with ints-off from
+ * the block layer
+ */
+static void _rpc_write_complete(struct work_struct *work)
+{
+	struct rpc_task *task;
+	struct nfs_write_data *wdata;
+
+	dprintk("%s enter\n", __func__);
+	task = container_of(work, struct rpc_task, u.tk_work);
+	wdata = container_of(task, struct nfs_write_data, task);
+
+	pnfs_ld_write_done(wdata);
+}
+
+void
+objlayout_write_done(struct objlayout_io_state *state, ssize_t status,
+		     bool sync)
+{
+	struct nfs_write_data *wdata;
+
+	dprintk("%s: Begin\n", __func__);
+	wdata = state->rpcdata;
+	state->status = status;
+	wdata->task.tk_status = status;
+	if (status >= 0) {
+		wdata->res.count = status;
+		wdata->verf.committed = state->committed;
+		dprintk("%s: Return status %d committed %d\n",
+			__func__, wdata->task.tk_status,
+			wdata->verf.committed);
+	} else
+		dprintk("%s: Return status %d\n",
+			__func__, wdata->task.tk_status);
+	objlayout_iodone(state);
+	/* must not use state after this point */
+
+	if (sync)
+		pnfs_ld_write_done(wdata);
+	else {
+		INIT_WORK(&wdata->task.u.tk_work, _rpc_write_complete);
+		schedule_work(&wdata->task.u.tk_work);
+	}
+}
+
+/*
+ * Perform sync or async writes.
+ */
+enum pnfs_try_status
+objlayout_write_pagelist(struct nfs_write_data *wdata,
+			 int how)
+{
+	struct objlayout_io_state *state;
+	ssize_t status;
+
+	dprintk("%s: Begin inode %p offset %llu count %u\n",
+		__func__, wdata->inode, wdata->args.offset, wdata->args.count);
+
+	state = objlayout_alloc_io_state(NFS_I(wdata->inode)->layout,
+					 wdata->args.pages,
+					 wdata->args.pgbase,
+					 wdata->args.offset,
+					 wdata->args.count,
+					 wdata->lseg, wdata,
+					 GFP_NOFS);
+	if (unlikely(!state)) {
+		status = -ENOMEM;
+		goto out;
+	}
+
+	state->sync = how & FLUSH_SYNC;
+
+	status = objio_write_pagelist(state, how & FLUSH_STABLE);
+ out:
+	dprintk("%s: Return status %Zd\n", __func__, status);
+	wdata->pnfs_error = status;
+	return PNFS_ATTEMPTED;
+}
+
+/*
  * Get Device Info API for io engines
  */
 struct objlayout_deviceinfo {
diff --git a/fs/nfs/objlayout/objlayout.h b/fs/nfs/objlayout/objlayout.h
index fa02621..9a405e8 100644
--- a/fs/nfs/objlayout/objlayout.h
+++ b/fs/nfs/objlayout/objlayout.h
@@ -59,6 +59,26 @@ OBJLAYOUT(struct pnfs_layout_hdr *lo)
 }
 
 /*
+ * per-I/O operation state
+ * embedded in objects provider io_state data structure
+ */
+struct objlayout_io_state {
+	struct pnfs_layout_segment *lseg;
+
+	struct page **pages;
+	unsigned pgbase;
+	unsigned nr_pages;
+	unsigned long count;
+	loff_t offset;
+	bool sync;
+
+	void *rpcdata;
+	int status;             /* res */
+	int eof;                /* res */
+	int committed;          /* res */
+};
+
+/*
  * Raid engine I/O API
  */
 extern int objio_alloc_lseg(struct pnfs_layout_segment **outp,
@@ -68,9 +88,24 @@ extern int objio_alloc_lseg(struct pnfs_layout_segment **outp,
 	gfp_t gfp_flags);
 extern void objio_free_lseg(struct pnfs_layout_segment *lseg);
 
+extern int objio_alloc_io_state(
+	struct pnfs_layout_segment *lseg,
+	struct objlayout_io_state **outp,
+	gfp_t gfp_flags);
+extern void objio_free_io_state(struct objlayout_io_state *state);
+
+extern ssize_t objio_read_pagelist(struct objlayout_io_state *ol_state);
+extern ssize_t objio_write_pagelist(struct objlayout_io_state *ol_state,
+				    bool stable);
+
 /*
  * callback API
  */
+extern void objlayout_read_done(struct objlayout_io_state *state,
+				ssize_t status, bool sync);
+extern void objlayout_write_done(struct objlayout_io_state *state,
+				 ssize_t status, bool sync);
+
 extern int objlayout_get_deviceinfo(struct pnfs_layout_hdr *pnfslay,
 	struct nfs4_deviceid *d_id, struct pnfs_osd_deviceaddr **deviceaddr,
 	gfp_t gfp_flags);
@@ -89,4 +124,11 @@ extern struct pnfs_layout_segment *objlayout_alloc_lseg(
 	gfp_t gfp_flags);
 extern void objlayout_free_lseg(struct pnfs_layout_segment *);
 
+extern enum pnfs_try_status objlayout_read_pagelist(
+	struct nfs_read_data *);
+
+extern enum pnfs_try_status objlayout_write_pagelist(
+	struct nfs_write_data *,
+	int how);
+
 #endif /* _OBJLAYOUT_H */
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 32/38] pnfs: layoutreturn
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (30 preceding siblings ...)
  2011-05-23  0:01 ` [PATCH v5 31/38] pnfs-obj: osd raid engine read/write implementation Benny Halevy
@ 2011-05-23  0:01 ` Benny Halevy
  2011-05-23  0:02 ` [PATCH v5 33/38] SQUASHME: pnfs: fix layout stateid in layoutreturn args Benny Halevy
                   ` (6 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-23  0:01 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

NFSv4.1 LAYOUTRETURN implementation

Currently, does not support layout-type payload encoding.

Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn>
[call pnfs_return_layout right before pnfs_destroy_layout]
[remove assert_spin_locked from pnfs_clear_lseg_list]
[remove wait parameter from the layoutreturn path.]
[remove return_type field from nfs4_layoutreturn_args]
[remove range from nfs4_layoutreturn_args]
[no need to send layoutcommit from _pnfs_return_layout]
[don't wait on sync layoutreturn]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/inode.c          |    3 +-
 fs/nfs/nfs4proc.c       |   82 ++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4xdr.c        |  111 ++++++++++++++++++++++++++++++++++++++++++++---
 fs/nfs/pnfs.c           |   56 ++++++++++++++++++++++++
 fs/nfs/pnfs.h           |   18 ++++++++
 include/linux/nfs4.h    |    1 +
 include/linux/nfs_xdr.h |   21 +++++++++
 7 files changed, 285 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 57bb31a..e9c6d9f 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1424,9 +1424,10 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
  */
 void nfs4_evict_inode(struct inode *inode)
 {
-	pnfs_destroy_layout(NFS_I(inode));
 	truncate_inode_pages(&inode->i_data, 0);
 	end_writeback(inode);
+	pnfs_return_layout(inode);
+	pnfs_destroy_layout(NFS_I(inode));
 	/* If we are holding a delegation, return it! */
 	nfs_inode_return_delegation_noreclaim(inode);
 	/* First call standard NFS clear_inode() code */
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 92c8bc4..5b4124e 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5673,6 +5673,88 @@ int nfs4_proc_layoutget(struct nfs4_layoutget *lgp)
 	return status;
 }
 
+static void
+nfs4_layoutreturn_prepare(struct rpc_task *task, void *calldata)
+{
+	struct nfs4_layoutreturn *lrp = calldata;
+
+	dprintk("--> %s\n", __func__);
+	if (nfs41_setup_sequence(lrp->clp->cl_session, &lrp->args.seq_args,
+				&lrp->res.seq_res, 0, task))
+		return;
+	rpc_call_start(task);
+}
+
+static void nfs4_layoutreturn_done(struct rpc_task *task, void *calldata)
+{
+	struct nfs4_layoutreturn *lrp = calldata;
+	struct nfs_server *server;
+
+	dprintk("--> %s\n", __func__);
+
+	if (!nfs4_sequence_done(task, &lrp->res.seq_res))
+		return;
+
+	server = NFS_SERVER(lrp->args.inode);
+	if (nfs4_async_handle_error(task, server, NULL) == -EAGAIN) {
+		nfs_restart_rpc(task, lrp->clp);
+		return;
+	}
+	if (task->tk_status == 0) {
+		struct pnfs_layout_hdr *lo = NFS_I(lrp->args.inode)->layout;
+
+		if (lrp->res.lrs_present) {
+			spin_lock(&lo->plh_inode->i_lock);
+			pnfs_set_layout_stateid(lo, &lrp->res.stateid, true);
+			spin_unlock(&lo->plh_inode->i_lock);
+		} else
+			BUG_ON(!list_empty(&lo->plh_segs));
+	}
+	dprintk("<-- %s\n", __func__);
+}
+
+static void nfs4_layoutreturn_release(void *calldata)
+{
+	struct nfs4_layoutreturn *lrp = calldata;
+
+	dprintk("--> %s\n", __func__);
+	put_layout_hdr(NFS_I(lrp->args.inode)->layout);
+	kfree(calldata);
+	dprintk("<-- %s\n", __func__);
+}
+
+static const struct rpc_call_ops nfs4_layoutreturn_call_ops = {
+	.rpc_call_prepare = nfs4_layoutreturn_prepare,
+	.rpc_call_done = nfs4_layoutreturn_done,
+	.rpc_release = nfs4_layoutreturn_release,
+};
+
+int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp)
+{
+	struct rpc_task *task;
+	struct rpc_message msg = {
+		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_LAYOUTRETURN],
+		.rpc_argp = &lrp->args,
+		.rpc_resp = &lrp->res,
+	};
+	struct rpc_task_setup task_setup_data = {
+		.rpc_client = lrp->clp->cl_rpcclient,
+		.rpc_message = &msg,
+		.callback_ops = &nfs4_layoutreturn_call_ops,
+		.callback_data = lrp,
+	};
+	int status;
+
+	dprintk("--> %s\n", __func__);
+	task = rpc_run_task(&task_setup_data);
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+	status = task->tk_status;
+	dprintk("<-- %s status=%d\n", __func__, status);
+	rpc_put_task(task);
+	return status;
+}
+
 static int
 _nfs4_proc_getdeviceinfo(struct nfs_server *server, struct pnfs_device *pdev)
 {
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index c3ccd2c..e53d7d8 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -338,7 +338,11 @@ static int nfs4_stat_to_errno(int);
 				1 /* layoutupdate4 layout type */ + \
 				1 /* NULL filelayout layoutupdate4 payload */)
 #define decode_layoutcommit_maxsz (op_decode_hdr_maxsz + 3)
-
+#define encode_layoutreturn_maxsz (8 + op_encode_hdr_maxsz + \
+				encode_stateid_maxsz + \
+				1 /* FIXME: opaque lrf_body always empty at the moment */)
+#define decode_layoutreturn_maxsz (op_decode_hdr_maxsz + \
+				1 + decode_stateid_maxsz)
 #else /* CONFIG_NFS_V4_1 */
 #define encode_sequence_maxsz	0
 #define decode_sequence_maxsz	0
@@ -760,7 +764,14 @@ static int nfs4_stat_to_errno(int);
 				decode_putfh_maxsz + \
 				decode_layoutcommit_maxsz + \
 				decode_getattr_maxsz)
-
+#define NFS4_enc_layoutreturn_sz (compound_encode_hdr_maxsz + \
+				encode_sequence_maxsz + \
+				encode_putfh_maxsz + \
+				encode_layoutreturn_maxsz)
+#define NFS4_dec_layoutreturn_sz (compound_decode_hdr_maxsz + \
+				decode_sequence_maxsz + \
+				decode_putfh_maxsz + \
+				decode_layoutreturn_maxsz)
 
 const u32 nfs41_maxwrite_overhead = ((RPC_MAX_HEADER_WITH_AUTH +
 				      compound_encode_hdr_maxsz +
@@ -1889,6 +1900,31 @@ encode_layoutcommit(struct xdr_stream *xdr,
 	hdr->replen += decode_layoutcommit_maxsz;
 	return 0;
 }
+
+static void
+encode_layoutreturn(struct xdr_stream *xdr,
+		    const struct nfs4_layoutreturn_args *args,
+		    struct compound_hdr *hdr)
+{
+	__be32 *p;
+
+	p = reserve_space(xdr, 20);
+	*p++ = cpu_to_be32(OP_LAYOUTRETURN);
+	*p++ = cpu_to_be32(args->reclaim);
+	*p++ = cpu_to_be32(args->layout_type);
+	*p++ = cpu_to_be32(IOMODE_ANY);
+	*p = cpu_to_be32(RETURN_FILE);
+	p = reserve_space(xdr, 16 + NFS4_STATEID_SIZE);
+	p = xdr_encode_hyper(p, 0);
+	p = xdr_encode_hyper(p, NFS4_MAX_UINT64);
+	spin_lock(&args->inode->i_lock);
+	xdr_encode_opaque_fixed(p, &NFS_I(args->inode)->layout->plh_stateid.data, NFS4_STATEID_SIZE);
+	spin_unlock(&args->inode->i_lock);
+	p = reserve_space(xdr, 4);
+	*p = cpu_to_be32(0);
+	hdr->nops++;
+	hdr->replen += decode_layoutreturn_maxsz;
+}
 #endif /* CONFIG_NFS_V4_1 */
 
 /*
@@ -2706,9 +2742,9 @@ static void nfs4_xdr_enc_layoutget(struct rpc_rqst *req,
 /*
  *  Encode LAYOUTCOMMIT request
  */
-static int nfs4_xdr_enc_layoutcommit(struct rpc_rqst *req,
-				     struct xdr_stream *xdr,
-				     struct nfs4_layoutcommit_args *args)
+static void nfs4_xdr_enc_layoutcommit(struct rpc_rqst *req,
+				      struct xdr_stream *xdr,
+				      struct nfs4_layoutcommit_args *args)
 {
 	struct compound_hdr hdr = {
 		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
@@ -2720,7 +2756,24 @@ static int nfs4_xdr_enc_layoutcommit(struct rpc_rqst *req,
 	encode_layoutcommit(xdr, args, &hdr);
 	encode_getfattr(xdr, args->bitmask, &hdr);
 	encode_nops(&hdr);
-	return 0;
+}
+
+/*
+ * Encode LAYOUTRETURN request
+ */
+static void nfs4_xdr_enc_layoutreturn(struct rpc_rqst *req,
+				      struct xdr_stream *xdr,
+				      struct nfs4_layoutreturn_args *args)
+{
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
+	};
+
+	encode_compound_hdr(xdr, req, &hdr);
+	encode_sequence(xdr, &args->seq_args, &hdr);
+	encode_putfh(xdr, NFS_FH(args->inode), &hdr);
+	encode_layoutreturn(xdr, args, &hdr);
+	encode_nops(&hdr);
 }
 #endif /* CONFIG_NFS_V4_1 */
 
@@ -5203,6 +5256,27 @@ out_overflow:
 	return -EIO;
 }
 
+static int decode_layoutreturn(struct xdr_stream *xdr,
+			       struct nfs4_layoutreturn_res *res)
+{
+	__be32 *p;
+	int status;
+
+	status = decode_op_hdr(xdr, OP_LAYOUTRETURN);
+	if (status)
+		return status;
+	p = xdr_inline_decode(xdr, 4);
+	if (unlikely(!p))
+		goto out_overflow;
+	res->lrs_present = be32_to_cpup(p);
+	if (res->lrs_present)
+		status = decode_stateid(xdr, &res->stateid);
+	return status;
+out_overflow:
+	print_overflow_msg(__func__, xdr);
+	return -EIO;
+}
+
 static int decode_layoutcommit(struct xdr_stream *xdr,
 			       struct rpc_rqst *req,
 			       struct nfs4_layoutcommit_res *res)
@@ -6320,6 +6394,30 @@ out:
 }
 
 /*
+ * Decode LAYOUTRETURN response
+ */
+static int nfs4_xdr_dec_layoutreturn(struct rpc_rqst *rqstp,
+				     struct xdr_stream *xdr,
+				     struct nfs4_layoutreturn_res *res)
+{
+	struct compound_hdr hdr;
+	int status;
+
+	status = decode_compound_hdr(xdr, &hdr);
+	if (status)
+		goto out;
+	status = decode_sequence(xdr, &res->seq_res, rqstp);
+	if (status)
+		goto out;
+	status = decode_putfh(xdr);
+	if (status)
+		goto out;
+	status = decode_layoutreturn(xdr, res);
+out:
+	return status;
+}
+
+/*
  * Decode LAYOUTCOMMIT response
  */
 static int nfs4_xdr_dec_layoutcommit(struct rpc_rqst *rqstp,
@@ -6547,6 +6645,7 @@ struct rpc_procinfo	nfs4_procedures[] = {
 	PROC(GETDEVICEINFO,	enc_getdeviceinfo,	dec_getdeviceinfo),
 	PROC(LAYOUTGET,		enc_layoutget,		dec_layoutget),
 	PROC(LAYOUTCOMMIT,	enc_layoutcommit,	dec_layoutcommit),
+	PROC(LAYOUTRETURN,	enc_layoutreturn,	dec_layoutreturn),
 #endif /* CONFIG_NFS_V4_1 */
 };
 
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index d39fcca..040219b 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -619,6 +619,62 @@ out_err_free:
 	return NULL;
 }
 
+static int
+return_layout(struct inode *ino)
+{
+	struct nfs4_layoutreturn *lrp;
+	struct nfs_server *server = NFS_SERVER(ino);
+	int status = -ENOMEM;
+
+	dprintk("--> %s\n", __func__);
+
+	lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
+	if (lrp == NULL) {
+		put_layout_hdr(NFS_I(ino)->layout);
+		goto out;
+	}
+	lrp->args.reclaim = 0;
+	lrp->args.layout_type = server->pnfs_curr_ld->id;
+	lrp->args.inode = ino;
+	lrp->clp = server->nfs_client;
+
+	status = nfs4_proc_layoutreturn(lrp);
+out:
+	dprintk("<-- %s status: %d\n", __func__, status);
+	return status;
+}
+
+/* Initiates a LAYOUTRETURN(FILE) */
+int
+_pnfs_return_layout(struct inode *ino)
+{
+	struct pnfs_layout_hdr *lo = NULL;
+	struct nfs_inode *nfsi = NFS_I(ino);
+	LIST_HEAD(tmp_list);
+	int status = 0;
+
+	dprintk("--> %s\n", __func__);
+
+	spin_lock(&ino->i_lock);
+	lo = nfsi->layout;
+	if (!lo || !mark_matching_lsegs_invalid(lo, &tmp_list, NULL)) {
+		spin_unlock(&ino->i_lock);
+		dprintk("%s: no layout segments to return\n", __func__);
+		goto out;
+	}
+	/* Reference matched in nfs4_layoutreturn_release */
+	get_layout_hdr(lo);
+	spin_unlock(&ino->i_lock);
+	pnfs_free_lseg_list(&tmp_list);
+
+	WARN_ON(test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags));
+
+	status = return_layout(ino);
+out:
+	dprintk("<-- %s status: %d\n", __func__, status);
+	return status;
+}
+
 bool pnfs_roc(struct inode *ino)
 {
 	struct pnfs_layout_hdr *lo;
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 8a6e1f1..21daebb 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -129,6 +129,7 @@ extern void pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *);
 extern int nfs4_proc_getdeviceinfo(struct nfs_server *server,
 				   struct pnfs_device *dev);
 extern int nfs4_proc_layoutget(struct nfs4_layoutget *lgp);
+extern int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp);
 
 /* pnfs.c */
 void get_layout_hdr(struct pnfs_layout_hdr *lo);
@@ -165,6 +166,7 @@ void pnfs_roc_set_barrier(struct inode *ino, u32 barrier);
 bool pnfs_roc_drain(struct inode *ino, u32 *barrier);
 void pnfs_set_layoutcommit(struct nfs_write_data *wdata);
 int pnfs_layoutcommit_inode(struct inode *inode, bool sync);
+int _pnfs_return_layout(struct inode *);
 int pnfs_ld_write_done(struct nfs_write_data *);
 int pnfs_ld_read_done(struct nfs_read_data *);
 
@@ -256,6 +258,17 @@ static inline void pnfs_clear_request_commit(struct nfs_page *req)
 		put_lseg(req->wb_commit_lseg);
 }
 
+static inline int pnfs_return_layout(struct inode *ino)
+{
+	struct nfs_inode *nfsi = NFS_I(ino);
+	struct nfs_server *nfss = NFS_SERVER(ino);
+
+	if (pnfs_enabled_sb(nfss) && nfsi->layout)
+		return _pnfs_return_layout(ino);
+
+	return 0;
+}
+
 #else  /* CONFIG_NFS_V4_1 */
 
 static inline void pnfs_destroy_all_layouts(struct nfs_client *clp)
@@ -298,6 +311,11 @@ pnfs_try_to_write_data(struct nfs_write_data *data,
 	return PNFS_NOT_ATTEMPTED;
 }
 
+static inline int pnfs_return_layout(struct inode *ino)
+{
+	return 0;
+}
+
 static inline bool
 pnfs_roc(struct inode *ino)
 {
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 178fafe..9376eaf 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -562,6 +562,7 @@ enum {
 	NFSPROC4_CLNT_LAYOUTGET,
 	NFSPROC4_CLNT_GETDEVICEINFO,
 	NFSPROC4_CLNT_LAYOUTCOMMIT,
+	NFSPROC4_CLNT_LAYOUTRETURN,
 };
 
 /* nfs41 types */
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 7c8ff09..1629b69 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -269,6 +269,27 @@ struct nfs4_layoutcommit_data {
 	struct nfs4_layoutcommit_res res;
 };
 
+struct nfs4_layoutreturn_args {
+	__u32   reclaim;
+	__u32   layout_type;
+	struct inode *inode;
+	struct nfs4_sequence_args seq_args;
+};
+
+struct nfs4_layoutreturn_res {
+	struct nfs4_sequence_res seq_res;
+	u32 lrs_present;
+	nfs4_stateid stateid;
+};
+
+struct nfs4_layoutreturn {
+	struct nfs4_layoutreturn_args args;
+	struct nfs4_layoutreturn_res res;
+	struct rpc_cred *cred;
+	struct nfs_client *clp;
+	int rpc_status;
+};
+
 /*
  * Arguments to the open call.
  */
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 33/38] SQUASHME: pnfs: fix layout stateid in layoutreturn args
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (31 preceding siblings ...)
  2011-05-23  0:01 ` [PATCH v5 32/38] pnfs: layoutreturn Benny Halevy
@ 2011-05-23  0:02 ` Benny Halevy
  2011-05-23  0:02 ` [PATCH v5 34/38] pnfs: layoutret_on_setattr Benny Halevy
                   ` (5 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-23  0:02 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/nfs4xdr.c        |    2 +-
 fs/nfs/pnfs.c           |   42 ++++++++++++++++--------------------------
 include/linux/nfs_xdr.h |    1 +
 3 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index e53d7d8..882c8f9 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -1918,7 +1918,7 @@ encode_layoutreturn(struct xdr_stream *xdr,
 	p = xdr_encode_hyper(p, 0);
 	p = xdr_encode_hyper(p, NFS4_MAX_UINT64);
 	spin_lock(&args->inode->i_lock);
-	xdr_encode_opaque_fixed(p, &NFS_I(args->inode)->layout->plh_stateid.data, NFS4_STATEID_SIZE);
+	xdr_encode_opaque_fixed(p, &args->stateid.data, NFS4_STATEID_SIZE);
 	spin_unlock(&args->inode->i_lock);
 	p = reserve_space(xdr, 4);
 	*p = cpu_to_be32(0);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 040219b..d3315c7 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -619,31 +619,6 @@ out_err_free:
 	return NULL;
 }
 
-static int
-return_layout(struct inode *ino)
-{
-	struct nfs4_layoutreturn *lrp;
-	struct nfs_server *server = NFS_SERVER(ino);
-	int status = -ENOMEM;
-
-	dprintk("--> %s\n", __func__);
-
-	lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
-	if (lrp == NULL) {
-		put_layout_hdr(NFS_I(ino)->layout);
-		goto out;
-	}
-	lrp->args.reclaim = 0;
-	lrp->args.layout_type = server->pnfs_curr_ld->id;
-	lrp->args.inode = ino;
-	lrp->clp = server->nfs_client;
-
-	status = nfs4_proc_layoutreturn(lrp);
-out:
-	dprintk("<-- %s status: %d\n", __func__, status);
-	return status;
-}
-
 /* Initiates a LAYOUTRETURN(FILE) */
 int
 _pnfs_return_layout(struct inode *ino)
@@ -651,17 +626,22 @@ _pnfs_return_layout(struct inode *ino)
 	struct pnfs_layout_hdr *lo = NULL;
 	struct nfs_inode *nfsi = NFS_I(ino);
 	LIST_HEAD(tmp_list);
+	struct nfs4_layoutreturn *lrp;
 	int status = 0;
 
 	dprintk("--> %s\n", __func__);
 
+	lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
+
 	spin_lock(&ino->i_lock);
 	lo = nfsi->layout;
 	if (!lo || !mark_matching_lsegs_invalid(lo, &tmp_list, NULL)) {
 		spin_unlock(&ino->i_lock);
 		dprintk("%s: no layout segments to return\n", __func__);
+		kfree(lrp);
 		goto out;
 	}
+	lrp->args.stateid = nfsi->layout->plh_stateid;
 	/* Reference matched in nfs4_layoutreturn_release */
 	get_layout_hdr(lo);
 	spin_unlock(&ino->i_lock);
@@ -669,7 +649,17 @@ _pnfs_return_layout(struct inode *ino)
 
 	WARN_ON(test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags));
 
-	status = return_layout(ino);
+	if (lrp == NULL) {
+		put_layout_hdr(NFS_I(ino)->layout);
+		status = -ENOMEM;
+		goto out;
+	}
+	lrp->args.reclaim = 0;
+	lrp->args.layout_type = NFS_SERVER(ino)->pnfs_curr_ld->id;
+	lrp->args.inode = ino;
+	lrp->clp = NFS_SERVER(ino)->nfs_client;
+
+	status = nfs4_proc_layoutreturn(lrp);
 out:
 	dprintk("<-- %s status: %d\n", __func__, status);
 	return status;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 1629b69..22a3d03 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -273,6 +273,7 @@ struct nfs4_layoutreturn_args {
 	__u32   reclaim;
 	__u32   layout_type;
 	struct inode *inode;
+	nfs4_stateid stateid;
 	struct nfs4_sequence_args seq_args;
 };
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 34/38] pnfs: layoutret_on_setattr
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (32 preceding siblings ...)
  2011-05-23  0:02 ` [PATCH v5 33/38] SQUASHME: pnfs: fix layout stateid in layoutreturn args Benny Halevy
@ 2011-05-23  0:02 ` Benny Halevy
  2011-05-23  0:02 ` [PATCH v5 35/38] pnfs: encode_layoutreturn Benny Halevy
                   ` (4 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-23  0:02 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

With the objects layout security model, we have object capabilities
that are associated with the layout and we anticipate that the server
will issue a cb_layoutrecall for any setattr that changes security
related attributes (user/group/mode/acl) or truncates the file.

Therefore, the layout is returned before issuing the setattr to avoid
the anticipated cb_layoutrecall.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/nfs4proc.c            |    3 +++
 fs/nfs/objlayout/objio_osd.c |    1 +
 fs/nfs/pnfs.h                |   22 ++++++++++++++++++++++
 3 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 5b4124e..5734009 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2361,6 +2361,9 @@ nfs4_proc_setattr(struct dentry *dentry, struct nfs_fattr *fattr,
 	struct nfs4_state *state = NULL;
 	int status;
 
+	if (pnfs_ld_layoutret_on_setattr(inode))
+		pnfs_return_layout(inode);
+
 	nfs_fattr_init(fattr);
 	
 	/* Search for an existing open(O_WRITE) file */
diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 55fa5fb..370da29 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -937,6 +937,7 @@ ssize_t objio_write_pagelist(struct objlayout_io_state *ol_state, bool stable)
 static struct pnfs_layoutdriver_type objlayout_type = {
 	.id = LAYOUT_OSD2_OBJECTS,
 	.name = "LAYOUT_OSD2_OBJECTS",
+	.flags                   = PNFS_LAYOUTRET_ON_SETATTR,
 
 	.alloc_layout_hdr        = objlayout_alloc_layout_hdr,
 	.free_layout_hdr         = objlayout_free_layout_hdr,
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 21daebb..2858553 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -65,6 +65,11 @@ enum {
 	NFS_LAYOUT_DESTROYED,		/* no new use of layout allowed */
 };
 
+enum layoutdriver_policy_flags {
+	/* Should the pNFS client commit and return the layout upon a setattr */
+	PNFS_LAYOUTRET_ON_SETATTR	= 1 << 0,
+};
+
 struct nfs4_deviceid_node;
 
 /* Per-layout driver specific registration structure */
@@ -73,6 +78,7 @@ struct pnfs_layoutdriver_type {
 	const u32 id;
 	const char *name;
 	struct module *owner;
+	unsigned flags;
 
 	struct pnfs_layout_hdr * (*alloc_layout_hdr) (struct inode *inode, gfp_t gfp_flags);
 	void (*free_layout_hdr) (struct pnfs_layout_hdr *);
@@ -258,6 +264,16 @@ static inline void pnfs_clear_request_commit(struct nfs_page *req)
 		put_lseg(req->wb_commit_lseg);
 }
 
+/* Should the pNFS client commit and return the layout upon a setattr */
+static inline bool
+pnfs_ld_layoutret_on_setattr(struct inode *inode)
+{
+	if (!pnfs_enabled_sb(NFS_SERVER(inode)))
+		return false;
+	return NFS_SERVER(inode)->pnfs_curr_ld->flags &
+		PNFS_LAYOUTRET_ON_SETATTR;
+}
+
 static inline int pnfs_return_layout(struct inode *ino)
 {
 	struct nfs_inode *nfsi = NFS_I(ino);
@@ -317,6 +333,12 @@ static inline int pnfs_return_layout(struct inode *ino)
 }
 
 static inline bool
+pnfs_ld_layoutret_on_setattr(struct inode *inode)
+{
+	return false;
+}
+
+static inline bool
 pnfs_roc(struct inode *ino)
 {
 	return false;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 35/38] pnfs: encode_layoutreturn
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (33 preceding siblings ...)
  2011-05-23  0:02 ` [PATCH v5 34/38] pnfs: layoutret_on_setattr Benny Halevy
@ 2011-05-23  0:02 ` Benny Halevy
  2011-05-23  0:02 ` [PATCH v5 36/38] pnfs-obj: report errors and .encode_layoutreturn Implementation Benny Halevy
                   ` (3 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-23  0:02 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

From: Andy Adamson <andros@netapp.com>

Add a layout driver method to encode the layout type specific
opaque part of layout return in-line in the xdr stream.

Currently the pnfs-objects layout driver uses it to encode i/o error
information on LAYOUTRETURN.

Signed-off-by: Andy Adamson <andros@netapp.com>
[fixup layout header pointer for encode_layoutreturn]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/nfs4xdr.c |    9 +++++++--
 fs/nfs/pnfs.h    |    4 ++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 882c8f9..ae580fa 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -1920,8 +1920,13 @@ encode_layoutreturn(struct xdr_stream *xdr,
 	spin_lock(&args->inode->i_lock);
 	xdr_encode_opaque_fixed(p, &args->stateid.data, NFS4_STATEID_SIZE);
 	spin_unlock(&args->inode->i_lock);
-	p = reserve_space(xdr, 4);
-	*p = cpu_to_be32(0);
+	if (NFS_SERVER(args->inode)->pnfs_curr_ld->encode_layoutreturn) {
+		NFS_SERVER(args->inode)->pnfs_curr_ld->encode_layoutreturn(
+			NFS_I(args->inode)->layout, xdr, args);
+	} else {
+		p = reserve_space(xdr, 4);
+		*p = cpu_to_be32(0);
+	}
 	hdr->nops++;
 	hdr->replen += decode_layoutreturn_maxsz;
 }
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 2858553..ed07f41 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -104,6 +104,10 @@ struct pnfs_layoutdriver_type {
 	enum pnfs_try_status (*write_pagelist) (struct nfs_write_data *nfs_data, int how);
 
 	void (*free_deviceid_node) (struct nfs4_deviceid_node *);
+
+	void (*encode_layoutreturn) (struct pnfs_layout_hdr *layoutid,
+				     struct xdr_stream *xdr,
+				     const struct nfs4_layoutreturn_args *args);
 };
 
 struct pnfs_layout_hdr {
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 36/38] pnfs-obj: report errors and .encode_layoutreturn Implementation.
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (34 preceding siblings ...)
  2011-05-23  0:02 ` [PATCH v5 35/38] pnfs: encode_layoutreturn Benny Halevy
@ 2011-05-23  0:02 ` Benny Halevy
  2011-05-23  0:03 ` [PATCH v5 37/38] pnfs: encode_layoutcommit Benny Halevy
                   ` (2 subsequent siblings)
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-23  0:02 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

From: Boaz Harrosh <bharrosh@panasas.com>

An io_state pre-allocates an error information structure for each
possible osd-device that might error during IO. When IO is done if all
was well the io_state is freed. (as today). If the I/O has ended with an
error, the io_state is queued on a per-layout err_list. When eventually
encode_layoutreturn() is called, each error is properly encoded on the
XDR buffer and only then the io_state is removed from err_list and
de-allocated.

It is up to the io_engine to fill in the segment that fault and the type
of osd_error that occurred. By calling objlayout_io_set_result() for
each failing device.

In objio_osd:
* Allocate io-error descriptors space as part of io_state
* Use generic objlayout error reporting at end of io.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/objlayout/objio_osd.c |   44 ++++++++-
 fs/nfs/objlayout/objlayout.c |  232 +++++++++++++++++++++++++++++++++++++++++-
 fs/nfs/objlayout/objlayout.h |   23 ++++
 3 files changed, 297 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 370da29..8af6290 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -384,13 +384,17 @@ int objio_alloc_io_state(struct pnfs_layout_segment *lseg,
 	struct objio_state *ios;
 	const unsigned first_size = sizeof(*ios) +
 				objio_seg->num_comps * sizeof(ios->per_dev[0]);
+	const unsigned sec_size = objio_seg->num_comps *
+						sizeof(ios->ol_state.ioerrs[0]);
 
 	dprintk("%s: num_comps=%d\n", __func__, objio_seg->num_comps);
-	ios = kzalloc(first_size, gfp_flags);
+	ios = kzalloc(first_size + sec_size, gfp_flags);
 	if (unlikely(!ios))
 		return -ENOMEM;
 
 	ios->layout = objio_seg;
+	ios->ol_state.ioerrs = ((void *)ios) + first_size;
+	ios->ol_state.num_comps = objio_seg->num_comps;
 
 	*outp = &ios->ol_state;
 	return 0;
@@ -404,6 +408,36 @@ void objio_free_io_state(struct objlayout_io_state *ol_state)
 	kfree(ios);
 }
 
+enum pnfs_osd_errno osd_pri_2_pnfs_err(enum osd_err_priority oep)
+{
+	switch (oep) {
+	case OSD_ERR_PRI_NO_ERROR:
+		return (enum pnfs_osd_errno)0;
+
+	case OSD_ERR_PRI_CLEAR_PAGES:
+		BUG_ON(1);
+		return 0;
+
+	case OSD_ERR_PRI_RESOURCE:
+		return PNFS_OSD_ERR_RESOURCE;
+	case OSD_ERR_PRI_BAD_CRED:
+		return PNFS_OSD_ERR_BAD_CRED;
+	case OSD_ERR_PRI_NO_ACCESS:
+		return PNFS_OSD_ERR_NO_ACCESS;
+	case OSD_ERR_PRI_UNREACHABLE:
+		return PNFS_OSD_ERR_UNREACHABLE;
+	case OSD_ERR_PRI_NOT_FOUND:
+		return PNFS_OSD_ERR_NOT_FOUND;
+	case OSD_ERR_PRI_NO_SPACE:
+		return PNFS_OSD_ERR_NO_SPACE;
+	default:
+		WARN_ON(1);
+		/* fallthrough */
+	case OSD_ERR_PRI_EIO:
+		return PNFS_OSD_ERR_EIO;
+	}
+}
+
 static void _clear_bio(struct bio *bio)
 {
 	struct bio_vec *bv;
@@ -450,6 +484,12 @@ static int _io_check(struct objio_state *ios, bool is_write)
 			continue; /* we recovered */
 		}
 		dev = ios->per_dev[i].dev;
+		objlayout_io_set_result(&ios->ol_state, dev,
+					&ios->layout->comps[dev].oc_object_id,
+					osd_pri_2_pnfs_err(osi.osd_err_pri),
+					ios->per_dev[i].offset,
+					ios->per_dev[i].length,
+					is_write);
 
 		if (osi.osd_err_pri >= oep) {
 			oep = osi.osd_err_pri;
@@ -949,6 +989,8 @@ static struct pnfs_layoutdriver_type objlayout_type = {
 	.write_pagelist          = objlayout_write_pagelist,
 
 	.free_deviceid_node	 = objio_free_deviceid_node,
+
+	.encode_layoutreturn     = objlayout_encode_layoutreturn,
 };
 
 MODULE_DESCRIPTION("pNFS Layout Driver for OSD2 objects");
diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c
index 5157ef6..f7caecf 100644
--- a/fs/nfs/objlayout/objlayout.c
+++ b/fs/nfs/objlayout/objlayout.c
@@ -50,6 +50,10 @@ objlayout_alloc_layout_hdr(struct inode *inode, gfp_t gfp_flags)
 	struct objlayout *objlay;
 
 	objlay = kzalloc(sizeof(struct objlayout), gfp_flags);
+	if (objlay) {
+		spin_lock_init(&objlay->lock);
+		INIT_LIST_HEAD(&objlay->err_list);
+	}
 	dprintk("%s: Return %p\n", __func__, objlay);
 	return &objlay->pnfs_layout;
 }
@@ -64,6 +68,7 @@ objlayout_free_layout_hdr(struct pnfs_layout_hdr *lo)
 
 	dprintk("%s: objlay %p\n", __func__, objlay);
 
+	WARN_ON(!list_empty(&objlay->err_list));
 	kfree(objlay);
 }
 
@@ -183,6 +188,7 @@ objlayout_alloc_io_state(struct pnfs_layout_hdr *pnfs_layout_type,
 		pgbase &= ~PAGE_MASK;
 	}
 
+	INIT_LIST_HEAD(&state->err_list);
 	state->lseg = lseg;
 	state->rpcdata = rpcdata;
 	state->pages = pages;
@@ -213,7 +219,52 @@ objlayout_iodone(struct objlayout_io_state *state)
 {
 	dprintk("%s: state %p status\n", __func__, state);
 
-	objlayout_free_io_state(state);
+	if (likely(state->status >= 0)) {
+		objlayout_free_io_state(state);
+	} else {
+		struct objlayout *objlay = OBJLAYOUT(state->lseg->pls_layout);
+
+		spin_lock(&objlay->lock);
+		list_add(&objlay->err_list, &state->err_list);
+		spin_unlock(&objlay->lock);
+	}
+}
+
+/*
+ * objlayout_io_set_result - Set an osd_error code on a specific osd comp.
+ *
+ * The @index component IO failed (error returned from target). Register
+ * the error for later reporting at layout-return.
+ */
+void
+objlayout_io_set_result(struct objlayout_io_state *state, unsigned index,
+			struct pnfs_osd_objid *pooid, int osd_error,
+			u64 offset, u64 length, bool is_write)
+{
+	struct pnfs_osd_ioerr *ioerr = &state->ioerrs[index];
+
+	BUG_ON(index >= state->num_comps);
+	if (osd_error) {
+		ioerr->oer_component = *pooid;
+		ioerr->oer_comp_offset = offset;
+		ioerr->oer_comp_length = length;
+		ioerr->oer_iswrite = is_write;
+		ioerr->oer_errno = osd_error;
+
+		dprintk("%s: err[%d]: errno=%d is_write=%d dev(%llx:%llx) "
+			"par=0x%llx obj=0x%llx offset=0x%llx length=0x%llx\n",
+			__func__, index, ioerr->oer_errno,
+			ioerr->oer_iswrite,
+			_DEVID_LO(&ioerr->oer_component.oid_device_id),
+			_DEVID_HI(&ioerr->oer_component.oid_device_id),
+			ioerr->oer_component.oid_partition_id,
+			ioerr->oer_component.oid_object_id,
+			ioerr->oer_comp_offset,
+			ioerr->oer_comp_length);
+	} else {
+		/* User need not call if no error is reported */
+		ioerr->oer_errno = 0;
+	}
 }
 
 /* Function scheduled on rpc workqueue to call ->nfs_readlist_complete().
@@ -382,6 +433,185 @@ objlayout_write_pagelist(struct nfs_write_data *wdata,
 	return PNFS_ATTEMPTED;
 }
 
+static int
+err_prio(u32 oer_errno)
+{
+	switch (oer_errno) {
+	case 0:
+		return 0;
+
+	case PNFS_OSD_ERR_RESOURCE:
+		return OSD_ERR_PRI_RESOURCE;
+	case PNFS_OSD_ERR_BAD_CRED:
+		return OSD_ERR_PRI_BAD_CRED;
+	case PNFS_OSD_ERR_NO_ACCESS:
+		return OSD_ERR_PRI_NO_ACCESS;
+	case PNFS_OSD_ERR_UNREACHABLE:
+		return OSD_ERR_PRI_UNREACHABLE;
+	case PNFS_OSD_ERR_NOT_FOUND:
+		return OSD_ERR_PRI_NOT_FOUND;
+	case PNFS_OSD_ERR_NO_SPACE:
+		return OSD_ERR_PRI_NO_SPACE;
+	default:
+		WARN_ON(1);
+		/* fallthrough */
+	case PNFS_OSD_ERR_EIO:
+		return OSD_ERR_PRI_EIO;
+	}
+}
+
+static void
+merge_ioerr(struct pnfs_osd_ioerr *dest_err,
+	    const struct pnfs_osd_ioerr *src_err)
+{
+	u64 dest_end, src_end;
+
+	if (!dest_err->oer_errno) {
+		*dest_err = *src_err;
+		/* accumulated device must be blank */
+		memset(&dest_err->oer_component.oid_device_id, 0,
+			sizeof(dest_err->oer_component.oid_device_id));
+
+		return;
+	}
+
+	if (dest_err->oer_component.oid_partition_id !=
+				src_err->oer_component.oid_partition_id)
+		dest_err->oer_component.oid_partition_id = 0;
+
+	if (dest_err->oer_component.oid_object_id !=
+				src_err->oer_component.oid_object_id)
+		dest_err->oer_component.oid_object_id = 0;
+
+	if (dest_err->oer_comp_offset > src_err->oer_comp_offset)
+		dest_err->oer_comp_offset = src_err->oer_comp_offset;
+
+	dest_end = end_offset(dest_err->oer_comp_offset,
+			      dest_err->oer_comp_length);
+	src_end =  end_offset(src_err->oer_comp_offset,
+			      src_err->oer_comp_length);
+	if (dest_end < src_end)
+		dest_end = src_end;
+
+	dest_err->oer_comp_length = dest_end - dest_err->oer_comp_offset;
+
+	if ((src_err->oer_iswrite == dest_err->oer_iswrite) &&
+	    (err_prio(src_err->oer_errno) > err_prio(dest_err->oer_errno))) {
+			dest_err->oer_errno = src_err->oer_errno;
+	} else if (src_err->oer_iswrite) {
+		dest_err->oer_iswrite = true;
+		dest_err->oer_errno = src_err->oer_errno;
+	}
+}
+
+static void
+encode_accumulated_error(struct objlayout *objlay, __be32 *p)
+{
+	struct objlayout_io_state *state, *tmp;
+	struct pnfs_osd_ioerr accumulated_err = {.oer_errno = 0};
+
+	list_for_each_entry_safe(state, tmp, &objlay->err_list, err_list) {
+		unsigned i;
+
+		for (i = 0; i < state->num_comps; i++) {
+			struct pnfs_osd_ioerr *ioerr = &state->ioerrs[i];
+
+			if (!ioerr->oer_errno)
+				continue;
+
+			printk(KERN_ERR "%s: err[%d]: errno=%d is_write=%d "
+				"dev(%llx:%llx) par=0x%llx obj=0x%llx "
+				"offset=0x%llx length=0x%llx\n",
+				__func__, i, ioerr->oer_errno,
+				ioerr->oer_iswrite,
+				_DEVID_LO(&ioerr->oer_component.oid_device_id),
+				_DEVID_HI(&ioerr->oer_component.oid_device_id),
+				ioerr->oer_component.oid_partition_id,
+				ioerr->oer_component.oid_object_id,
+				ioerr->oer_comp_offset,
+				ioerr->oer_comp_length);
+
+			merge_ioerr(&accumulated_err, ioerr);
+		}
+		list_del(&state->err_list);
+		objlayout_free_io_state(state);
+	}
+
+	pnfs_osd_xdr_encode_ioerr(p, &accumulated_err);
+}
+
+void
+objlayout_encode_layoutreturn(struct pnfs_layout_hdr *pnfslay,
+			      struct xdr_stream *xdr,
+			      const struct nfs4_layoutreturn_args *args)
+{
+	struct objlayout *objlay = OBJLAYOUT(pnfslay);
+	struct objlayout_io_state *state, *tmp;
+	__be32 *start;
+
+	dprintk("%s: Begin\n", __func__);
+	start = xdr_reserve_space(xdr, 4);
+	BUG_ON(!start);
+
+	spin_lock(&objlay->lock);
+
+	list_for_each_entry_safe(state, tmp, &objlay->err_list, err_list) {
+		__be32 *last_xdr = NULL, *p;
+		unsigned i;
+		int res = 0;
+
+		for (i = 0; i < state->num_comps; i++) {
+			struct pnfs_osd_ioerr *ioerr = &state->ioerrs[i];
+
+			if (!ioerr->oer_errno)
+				continue;
+
+			dprintk("%s: err[%d]: errno=%d is_write=%d "
+				"dev(%llx:%llx) par=0x%llx obj=0x%llx "
+				"offset=0x%llx length=0x%llx\n",
+				__func__, i, ioerr->oer_errno,
+				ioerr->oer_iswrite,
+				_DEVID_LO(&ioerr->oer_component.oid_device_id),
+				_DEVID_HI(&ioerr->oer_component.oid_device_id),
+				ioerr->oer_component.oid_partition_id,
+				ioerr->oer_component.oid_object_id,
+				ioerr->oer_comp_offset,
+				ioerr->oer_comp_length);
+
+			p = pnfs_osd_xdr_ioerr_reserve_space(xdr);
+			if (unlikely(!p)) {
+				res = -E2BIG;
+				break; /* accumulated_error */
+			}
+
+			last_xdr = p;
+			pnfs_osd_xdr_encode_ioerr(p, &state->ioerrs[i]);
+		}
+
+		/* TODO: use xdr_write_pages */
+		if (unlikely(res)) {
+			/* no space for even one error descriptor */
+			BUG_ON(!last_xdr);
+
+			/* we've encountered a situation with lots and lots of
+			 * errors and no space to encode them all. Use the last
+			 * available slot to report the union of all the
+			 * remaining errors.
+			 */
+			encode_accumulated_error(objlay, last_xdr);
+			goto loop_done;
+		}
+		list_del(&state->err_list);
+		objlayout_free_io_state(state);
+	}
+loop_done:
+	spin_unlock(&objlay->lock);
+
+	*start = cpu_to_be32((xdr->p - start - 1) * 4);
+	dprintk("%s: Return\n", __func__);
+}
+
+
 /*
  * Get Device Info API for io engines
  */
diff --git a/fs/nfs/objlayout/objlayout.h b/fs/nfs/objlayout/objlayout.h
index 9a405e8..b0bb975 100644
--- a/fs/nfs/objlayout/objlayout.h
+++ b/fs/nfs/objlayout/objlayout.h
@@ -50,6 +50,10 @@
  */
 struct objlayout {
 	struct pnfs_layout_hdr pnfs_layout;
+
+	 /* for layout_return */
+	spinlock_t lock;
+	struct list_head err_list;
 };
 
 static inline struct objlayout *
@@ -76,6 +80,16 @@ struct objlayout_io_state {
 	int status;             /* res */
 	int eof;                /* res */
 	int committed;          /* res */
+
+	/* Error reporting (layout_return) */
+	struct list_head err_list;
+	unsigned num_comps;
+	/* Pointer to array of error descriptors of size num_comps.
+	 * It should contain as many entries as devices in the osd_layout
+	 * that participate in the I/O. It is up to the io_engine to allocate
+	 * needed space and set num_comps.
+	 */
+	struct pnfs_osd_ioerr *ioerrs;
 };
 
 /*
@@ -101,6 +115,10 @@ extern ssize_t objio_write_pagelist(struct objlayout_io_state *ol_state,
 /*
  * callback API
  */
+extern void objlayout_io_set_result(struct objlayout_io_state *state,
+			unsigned index, struct pnfs_osd_objid *pooid,
+			int osd_error, u64 offset, u64 length, bool is_write);
+
 extern void objlayout_read_done(struct objlayout_io_state *state,
 				ssize_t status, bool sync);
 extern void objlayout_write_done(struct objlayout_io_state *state,
@@ -131,4 +149,9 @@ extern enum pnfs_try_status objlayout_write_pagelist(
 	struct nfs_write_data *,
 	int how);
 
+extern void objlayout_encode_layoutreturn(
+	struct pnfs_layout_hdr *,
+	struct xdr_stream *,
+	const struct nfs4_layoutreturn_args *);
+
 #endif /* _OBJLAYOUT_H */
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 37/38] pnfs: encode_layoutcommit
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (35 preceding siblings ...)
  2011-05-23  0:02 ` [PATCH v5 36/38] pnfs-obj: report errors and .encode_layoutreturn Implementation Benny Halevy
@ 2011-05-23  0:03 ` Benny Halevy
  2011-05-23  0:03 ` [PATCH v5 38/38] pnfs-obj: objlayout_encode_layoutcommit implementation Benny Halevy
  2011-05-23  0:22 ` [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-23  0:03 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

Add a layout driver method to encode the layout type specific
opaque part of layout commit in-line in the xdr stream.

Currently, the pnfs-objects layout driver uses it to encode metadata hints
to the MDS and the blocks layout driver to commit provisionally allocated
extents to the file.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/nfs4xdr.c |   16 +++++++++++++---
 fs/nfs/pnfs.h    |    4 ++++
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index ae580fa..899ffc0 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -1875,6 +1875,7 @@ encode_layoutget(struct xdr_stream *xdr,
 
 static int
 encode_layoutcommit(struct xdr_stream *xdr,
+		    struct inode *inode,
 		    const struct nfs4_layoutcommit_args *args,
 		    struct compound_hdr *hdr)
 {
@@ -1883,7 +1884,7 @@ encode_layoutcommit(struct xdr_stream *xdr,
 	dprintk("%s: lbw: %llu type: %d\n", __func__, args->lastbytewritten,
 		NFS_SERVER(args->inode)->pnfs_curr_ld->id);
 
-	p = reserve_space(xdr, 48 + NFS4_STATEID_SIZE);
+	p = reserve_space(xdr, 44 + NFS4_STATEID_SIZE);
 	*p++ = cpu_to_be32(OP_LAYOUTCOMMIT);
 	/* Only whole file layouts */
 	p = xdr_encode_hyper(p, 0); /* offset */
@@ -1894,7 +1895,14 @@ encode_layoutcommit(struct xdr_stream *xdr,
 	p = xdr_encode_hyper(p, args->lastbytewritten);
 	*p++ = cpu_to_be32(0); /* Never send time_modify_changed */
 	*p++ = cpu_to_be32(NFS_SERVER(args->inode)->pnfs_curr_ld->id);/* type */
-	*p++ = cpu_to_be32(0); /* no file layout payload */
+
+	if (NFS_SERVER(inode)->pnfs_curr_ld->encode_layoutcommit)
+		NFS_SERVER(inode)->pnfs_curr_ld->encode_layoutcommit(
+			NFS_I(inode)->layout, xdr, args);
+	else {
+		p = reserve_space(xdr, 4);
+		*p = cpu_to_be32(0); /* no layout-type payload */
+	}
 
 	hdr->nops++;
 	hdr->replen += decode_layoutcommit_maxsz;
@@ -2751,6 +2759,8 @@ static void nfs4_xdr_enc_layoutcommit(struct rpc_rqst *req,
 				      struct xdr_stream *xdr,
 				      struct nfs4_layoutcommit_args *args)
 {
+	struct nfs4_layoutcommit_data *data =
+		container_of(args, struct nfs4_layoutcommit_data, args);
 	struct compound_hdr hdr = {
 		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
 	};
@@ -2758,7 +2768,7 @@ static void nfs4_xdr_enc_layoutcommit(struct rpc_rqst *req,
 	encode_compound_hdr(xdr, req, &hdr);
 	encode_sequence(xdr, &args->seq_args, &hdr);
 	encode_putfh(xdr, NFS_FH(args->inode), &hdr);
-	encode_layoutcommit(xdr, args, &hdr);
+	encode_layoutcommit(xdr, data->args.inode, args, &hdr);
 	encode_getfattr(xdr, args->bitmask, &hdr);
 	encode_nops(&hdr);
 }
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index ed07f41..25da946 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -108,6 +108,10 @@ struct pnfs_layoutdriver_type {
 	void (*encode_layoutreturn) (struct pnfs_layout_hdr *layoutid,
 				     struct xdr_stream *xdr,
 				     const struct nfs4_layoutreturn_args *args);
+
+	void (*encode_layoutcommit) (struct pnfs_layout_hdr *layoutid,
+				     struct xdr_stream *xdr,
+				     const struct nfs4_layoutcommit_args *args);
 };
 
 struct pnfs_layout_hdr {
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 38/38] pnfs-obj: objlayout_encode_layoutcommit implementation
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (36 preceding siblings ...)
  2011-05-23  0:03 ` [PATCH v5 37/38] pnfs: encode_layoutcommit Benny Halevy
@ 2011-05-23  0:03 ` Benny Halevy
  2011-05-23  0:22 ` [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-23  0:03 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Boaz Harrosh, linux-nfs

From: Boaz Harrosh <bharrosh@panasas.com>

* Define API for io-engines to report delta_space_used in IOs
* Encode the osd-layout specific information of the layoutcommit
  XDR buffer.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
 fs/nfs/objlayout/objio_osd.c |    1 +
 fs/nfs/objlayout/objlayout.c |   30 ++++++++++++++++++++++++++++++
 fs/nfs/objlayout/objlayout.h |   30 ++++++++++++++++++++++++++++++
 3 files changed, 61 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 8af6290..4140837 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -990,6 +990,7 @@ static struct pnfs_layoutdriver_type objlayout_type = {
 
 	.free_deviceid_node	 = objio_free_deviceid_node,
 
+	.encode_layoutcommit	 = objlayout_encode_layoutcommit,
 	.encode_layoutreturn     = objlayout_encode_layoutreturn,
 };
 
diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c
index f7caecf..dc3956c 100644
--- a/fs/nfs/objlayout/objlayout.c
+++ b/fs/nfs/objlayout/objlayout.c
@@ -225,6 +225,7 @@ objlayout_iodone(struct objlayout_io_state *state)
 		struct objlayout *objlay = OBJLAYOUT(state->lseg->pls_layout);
 
 		spin_lock(&objlay->lock);
+		objlay->delta_space_valid = OBJ_DSU_INVALID;
 		list_add(&objlay->err_list, &state->err_list);
 		spin_unlock(&objlay->lock);
 	}
@@ -433,6 +434,35 @@ objlayout_write_pagelist(struct nfs_write_data *wdata,
 	return PNFS_ATTEMPTED;
 }
 
+void
+objlayout_encode_layoutcommit(struct pnfs_layout_hdr *pnfslay,
+			      struct xdr_stream *xdr,
+			      const struct nfs4_layoutcommit_args *args)
+{
+	struct objlayout *objlay = OBJLAYOUT(pnfslay);
+	struct pnfs_osd_layoutupdate lou;
+	__be32 *start;
+
+	dprintk("%s: Begin\n", __func__);
+
+	spin_lock(&objlay->lock);
+	lou.dsu_valid = (objlay->delta_space_valid == OBJ_DSU_VALID);
+	lou.dsu_delta = objlay->delta_space_used;
+	objlay->delta_space_used = 0;
+	objlay->delta_space_valid = OBJ_DSU_INIT;
+	lou.olu_ioerr_flag = !list_empty(&objlay->err_list);
+	spin_unlock(&objlay->lock);
+
+	start = xdr_reserve_space(xdr, 4);
+
+	BUG_ON(pnfs_osd_xdr_encode_layoutupdate(xdr, &lou));
+
+	*start = cpu_to_be32((xdr->p - start - 1) * 4);
+
+	dprintk("%s: Return delta_space_used %lld err %d\n", __func__,
+		lou.dsu_delta, lou.olu_ioerr_flag);
+}
+
 static int
 err_prio(u32 oer_errno)
 {
diff --git a/fs/nfs/objlayout/objlayout.h b/fs/nfs/objlayout/objlayout.h
index b0bb975..a8244c8 100644
--- a/fs/nfs/objlayout/objlayout.h
+++ b/fs/nfs/objlayout/objlayout.h
@@ -51,6 +51,14 @@
 struct objlayout {
 	struct pnfs_layout_hdr pnfs_layout;
 
+	 /* for layout_commit */
+	enum osd_delta_space_valid_enum {
+		OBJ_DSU_INIT = 0,
+		OBJ_DSU_VALID,
+		OBJ_DSU_INVALID,
+	} delta_space_valid;
+	s64 delta_space_used;  /* consumed by write ops */
+
 	 /* for layout_return */
 	spinlock_t lock;
 	struct list_head err_list;
@@ -119,6 +127,23 @@ extern void objlayout_io_set_result(struct objlayout_io_state *state,
 			unsigned index, struct pnfs_osd_objid *pooid,
 			int osd_error, u64 offset, u64 length, bool is_write);
 
+static inline void
+objlayout_add_delta_space_used(struct objlayout_io_state *state, s64 space_used)
+{
+	struct objlayout *objlay = OBJLAYOUT(state->lseg->pls_layout);
+
+	/* If one of the I/Os errored out and the delta_space_used was
+	 * invalid we render the complete report as invalid. Protocol mandate
+	 * the DSU be accurate or not reported.
+	 */
+	spin_lock(&objlay->lock);
+	if (objlay->delta_space_valid != OBJ_DSU_INVALID) {
+		objlay->delta_space_valid = OBJ_DSU_VALID;
+		objlay->delta_space_used += space_used;
+	}
+	spin_unlock(&objlay->lock);
+}
+
 extern void objlayout_read_done(struct objlayout_io_state *state,
 				ssize_t status, bool sync);
 extern void objlayout_write_done(struct objlayout_io_state *state,
@@ -149,6 +174,11 @@ extern enum pnfs_try_status objlayout_write_pagelist(
 	struct nfs_write_data *,
 	int how);
 
+extern void objlayout_encode_layoutcommit(
+	struct pnfs_layout_hdr *,
+	struct xdr_stream *,
+	const struct nfs4_layoutcommit_args *);
+
 extern void objlayout_encode_layoutreturn(
 	struct pnfs_layout_hdr *,
 	struct xdr_stream *,
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCHSET v5 0/38] pnfs for 2.6.40
  2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
                   ` (37 preceding siblings ...)
  2011-05-23  0:03 ` [PATCH v5 38/38] pnfs-obj: objlayout_encode_layoutcommit implementation Benny Halevy
@ 2011-05-23  0:22 ` Benny Halevy
  38 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-23  0:22 UTC (permalink / raw)
  To: Trond Myklebust, Boaz Harrosh; +Cc: NFS list

On 2011-05-23 02:43, Benny Halevy wrote:
> This version includes review comment fixes on top of Boaz v4 sent yesterday.

I forgot to mention that this patchset is based on v2.6.39.
I pushed the pnfs-submit branch to
git://linux-nfs.org/~bhalevy/linux-pnfs.git
This patchset under the pnfs-for-2.6.40-v5-2011-05-23 tag

Benny


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 20/38] pnfs: per mount layout driver private data
  2011-05-22 23:57 ` [PATCH v5 20/38] pnfs: per mount layout driver private data Benny Halevy
@ 2011-05-23  4:38   ` Boaz Harrosh
  2011-05-23 13:36     ` Benny Halevy
  0 siblings, 1 reply; 51+ messages in thread
From: Boaz Harrosh @ 2011-05-23  4:38 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Trond Myklebust, linux-nfs

On 05/23/2011 02:57 AM, Benny Halevy wrote:
> With the objects layout security model we have object capabilities
> that are associated with the layout ad we anticipate that the server
> will issue a cb_layoutrecall for any setattr that changes security
> related attributes (user/group/mode/acl) or truncates the file.
> Therefore, the client returns the layout in advance to avoid the
> extra layout recall.
> 

This looks like the wrong text. It belongs to that other patch.

The title and actual patch do match

Boaz
> [get rid of ds_[rw]size]
> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> ---
>  include/linux/nfs_fs_sb.h |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index 87694ca..66e031f 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -143,6 +143,7 @@ struct nfs_server {
>  						   filesystem */
>  	struct pnfs_layoutdriver_type  *pnfs_curr_ld; /* Active layout driver */
>  	struct rpc_wait_queue	roc_rpcwaitq;
> +	void			       *pnfs_ld_data; /* Per-mount data */
>  
>  	/* the following fields are protected by nfs_client->cl_lock */
>  	struct rb_root		state_owners;


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 23/38] SQUASHME: pnfs-obj: use global device cache
  2011-05-22 23:58 ` [PATCH v5 23/38] SQUASHME: pnfs-obj: use global device cache Benny Halevy
@ 2011-05-23  4:52   ` Boaz Harrosh
  2011-05-23 13:44     ` Benny Halevy
  0 siblings, 1 reply; 51+ messages in thread
From: Boaz Harrosh @ 2011-05-23  4:52 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Trond Myklebust, linux-nfs

On 05/23/2011 02:58 AM, Benny Halevy wrote:
> Signed-off-by: Benny Halevy <bhalevy@panasas.com>

Benny sorry but NACK on the global device cache for now

This is to late at this stage. We have decided that first imp will
use the private cache and we'll postpone these cleanups for later.

All other code was well tested for years, all this is new code, and
new behaviour that we will not have time to test. I do not like the
code as it is. Because currently it will release the device on layout_return.
Where is the cache? There is much more work to do here!

We already said not to do this in this merge why the change of heart?

Boaz

> ---
>  fs/nfs/objlayout/objio_osd.c |  102 ++++++++++++++++++++----------------------
>  1 files changed, 49 insertions(+), 53 deletions(-)
> 
> diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
> index 752bf7a..bcc8468 100644
> --- a/fs/nfs/objlayout/objio_osd.c
> +++ b/fs/nfs/objlayout/objio_osd.c
> @@ -46,66 +46,55 @@
>  
>  #define _LLU(x) ((unsigned long long)x)
>  
> -/* A per mountpoint struct currently for device cache */
> -struct objio_mount_type {
> -	struct list_head dev_list;
> -	spinlock_t dev_list_lock;
> -};
> -
> -struct _dev_ent {
> -	struct list_head list;
> -	struct nfs4_deviceid d_id;
> +struct objio_dev_ent {
> +	struct nfs4_deviceid_node id_node;
>  	struct osd_dev *od;
>  };
>  
> -static struct osd_dev *___dev_list_find(struct objio_mount_type *omt,
> -	struct nfs4_deviceid *d_id)
> +static void
> +objio_free_deviceid_node(struct nfs4_deviceid_node *d)
>  {
> -	struct list_head *le;
> +	struct objio_dev_ent *de = container_of(d, struct objio_dev_ent, id_node);
>  
> -	list_for_each(le, &omt->dev_list) {
> -		struct _dev_ent *de = list_entry(le, struct _dev_ent, list);
> -
> -		if (0 == memcmp(&de->d_id, d_id, sizeof(*d_id)))
> -			return de->od;
> -	}
> -
> -	return NULL;
> +	osduld_put_device(de->od);
> +	kfree(de);
>  }
>  
> -static struct osd_dev *_dev_list_find(struct objio_mount_type *omt,
> -	struct nfs4_deviceid *d_id)
> +static struct objio_dev_ent *_dev_list_find(const struct nfs_client *clp,
> +	const struct nfs4_deviceid *d_id)
>  {
> -	struct osd_dev *od;
> +	struct nfs4_deviceid_node *d;
>  
> -	spin_lock(&omt->dev_list_lock);
> -	od = ___dev_list_find(omt, d_id);
> -	spin_unlock(&omt->dev_list_lock);
> -	return od;
> +	d = nfs4_find_get_deviceid(clp, d_id);
> +	if (!d)
> +		return NULL;
> +	return container_of(d, struct objio_dev_ent, id_node);
>  }
>  
> -static int _dev_list_add(struct objio_mount_type *omt,
> -	struct nfs4_deviceid *d_id, struct osd_dev *od,
> +static int _dev_list_add(const struct nfs_server *nfss,
> +	const struct nfs4_deviceid *d_id, struct osd_dev *od,
>  	gfp_t gfp_flags)
>  {
> -	struct _dev_ent *de = kzalloc(sizeof(*de), gfp_flags);
> +	struct nfs4_deviceid_node *d;
> +	struct objio_dev_ent *de = kzalloc(sizeof(*de), gfp_flags);
> +	struct objio_dev_ent *n;
>  
>  	if (!de)
>  		return -ENOMEM;
>  
> -	spin_lock(&omt->dev_list_lock);
> +	nfs4_init_deviceid_node(&de->id_node,
> +				nfss->pnfs_curr_ld,
> +				nfss->nfs_client,
> +				d_id);
> +	de->od = od;
>  
> -	if (___dev_list_find(omt, d_id)) {
> -		kfree(de);
> -		goto out;
> +	d = nfs4_insert_deviceid_node(&de->id_node);
> +	n = container_of(d, struct objio_dev_ent, id_node);
> +	if (n != de) {
> +		BUG_ON(n->od != od);
> +		objio_free_deviceid_node(&de->id_node);
>  	}
>  
> -	de->d_id = *d_id;
> -	de->od = od;
> -	list_add(&de->list, &omt->dev_list);
> -
> -out:
> -	spin_unlock(&omt->dev_list_lock);
>  	return 0;
>  }
>  
> @@ -128,7 +117,7 @@ struct objio_segment {
>  	unsigned comps_index;
>  	unsigned num_comps;
>  	/* variable length */
> -	struct osd_dev	*ods[1];
> +	struct objio_dev_ent *ods[1];
>  };
>  
>  static inline struct objio_segment *
> @@ -139,23 +128,22 @@ OBJIO_LSEG(struct pnfs_layout_segment *lseg)
>  
>  /* Send and wait for a get_device_info of devices in the layout,
>     then look them up with the osd_initiator library */
> -static struct osd_dev *_device_lookup(struct pnfs_layout_hdr *pnfslay,
> +static struct objio_dev_ent *_device_lookup(struct pnfs_layout_hdr *pnfslay,
>  				struct objio_segment *objio_seg, unsigned comp,
>  				gfp_t gfp_flags)
>  {
>  	struct pnfs_osd_deviceaddr *deviceaddr;
>  	struct nfs4_deviceid *d_id;
> +	struct objio_dev_ent *ode;
>  	struct osd_dev *od;
>  	struct osd_dev_info odi;
> -	struct objio_mount_type *omt =
> -				   NFS_SERVER(pnfslay->plh_inode)->pnfs_ld_data;
>  	int err;
>  
>  	d_id = &objio_seg->comps[comp].oc_object_id.oid_device_id;
>  
> -	od = _dev_list_find(omt, d_id);
> -	if (od)
> -		return od;
> +	ode = _dev_list_find(NFS_SERVER(pnfslay->plh_inode)->nfs_client, d_id);
> +	if (ode)
> +		return ode;
>  
>  	err = objlayout_get_deviceinfo(pnfslay, d_id, &deviceaddr, gfp_flags);
>  	if (unlikely(err)) {
> @@ -188,7 +176,7 @@ static struct osd_dev *_device_lookup(struct pnfs_layout_hdr *pnfslay,
>  		goto out;
>  	}
>  
> -	_dev_list_add(omt, d_id, od, gfp_flags);
> +	_dev_list_add(NFS_SERVER(pnfslay->plh_inode), d_id, od, gfp_flags);
>  
>  out:
>  	dprintk("%s: return=%d\n", __func__, err);
> @@ -205,14 +193,14 @@ static int objio_devices_lookup(struct pnfs_layout_hdr *pnfslay,
>  
>  	/* lookup all devices */
>  	for (i = 0; i < objio_seg->num_comps; i++) {
> -		struct osd_dev *od;
> +		struct objio_dev_ent *ode;
>  
> -		od = _device_lookup(pnfslay, objio_seg, i, gfp_flags);
> -		if (unlikely(IS_ERR(od))) {
> -			err = PTR_ERR(od);
> +		ode = _device_lookup(pnfslay, objio_seg, i, gfp_flags);
> +		if (unlikely(IS_ERR(ode))) {
> +			err = PTR_ERR(ode);
>  			goto out;
>  		}
> -		objio_seg->ods[i] = od;
> +		objio_seg->ods[i] = ode;
>  	}
>  	err = 0;
>  
> @@ -348,8 +336,14 @@ err:
>  
>  void objio_free_lseg(struct pnfs_layout_segment *lseg)
>  {
> +	int i;
>  	struct objio_segment *objio_seg = OBJIO_LSEG(lseg);
>  
> +	for (i = 0; i < objio_seg->num_comps; i++) {
> +		if (!objio_seg->ods[i])
> +			break;
> +		nfs4_put_deviceid_node(&objio_seg->ods[i]->id_node);
> +	}
>  	kfree(objio_seg);
>  }
>  
> @@ -360,6 +354,8 @@ static struct pnfs_layoutdriver_type objlayout_type = {
>  
>  	.alloc_lseg              = objlayout_alloc_lseg,
>  	.free_lseg               = objlayout_free_lseg,
> +
> +	.free_deviceid_node	 = objio_free_deviceid_node,
>  };
>  
>  MODULE_DESCRIPTION("pNFS Layout Driver for OSD2 objects");


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 30/38] SQUASHME: initialize data->task on the non-rpc io done success paths
  2011-05-23  0:01 ` [PATCH v5 30/38] SQUASHME: initialize data->task on the non-rpc io done success paths Benny Halevy
@ 2011-05-23  4:58   ` Boaz Harrosh
  2011-05-23 13:47     ` Benny Halevy
  0 siblings, 1 reply; 51+ messages in thread
From: Boaz Harrosh @ 2011-05-23  4:58 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Trond Myklebust, linux-nfs

On 05/23/2011 03:01 AM, Benny Halevy wrote:
> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> ---
>  fs/nfs/pnfs.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 0f59802..d39fcca 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -1064,6 +1064,7 @@ pnfs_ld_write_done(struct nfs_write_data *data)
>  
>  	if (!data->pnfs_error) {
>  		pnfs_set_layoutcommit(data);
> +		memset(&data->task, 0, sizeof(data->task));

What?
We used this data task to come here. See: objlayout.c::objlayout_read/write_done()
why do you think it is invalid?

>  		data->mds_ops->rpc_call_done(&data->task, data);
>  		data->mds_ops->rpc_release(data);
>  		return 0;
> @@ -1113,6 +1114,7 @@ pnfs_ld_read_done(struct nfs_read_data *data)
>  
>  	if (!data->pnfs_error) {
>  		__nfs4_read_done_cb(data);
> +		memset(&data->task, 0, sizeof(data->task));

Same here this is called on the task thread

>  		data->mds_ops->rpc_call_done(&data->task, data);
>  		data->mds_ops->rpc_release(data);
>  		return 0;

Boaz

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PTACH] SQUASHME: pnfs-obj: Important fallout from the last rebase
  2011-05-23  0:01 ` [PATCH v5 31/38] pnfs-obj: osd raid engine read/write implementation Benny Halevy
@ 2011-05-23 10:44   ` Boaz Harrosh
  2011-05-23 13:53     ` Benny Halevy
  0 siblings, 1 reply; 51+ messages in thread
From: Boaz Harrosh @ 2011-05-23 10:44 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Trond Myklebust, linux-nfs


On the last rebase I dropped this hunk it needs to go into  the:
	[PATCH v5 31/38] pnfs-obj: osd raid engine read/write implementation

With out this the code does not work

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 105d8a6..74b5d3f 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -989,19 +990,6 @@ ssize_t objio_write_pagelist(struct objlayout_io_state *ol_state, bool stable)
 	return _write_exec(ios);
 }
 
-/*
- * objlayout_pg_test(). Called by nfs_can_coalesce_requests()
- *
- * return 1 :  coalesce page
- * return 0 :  don't coalesce page
- */
-int
-objlayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
-		   struct nfs_page *req)
-{
-	return 1;
-}
-
 static struct pnfs_layoutdriver_type objlayout_type = {
 	.id = LAYOUT_OSD2_OBJECTS,
 	.name = "LAYOUT_OSD2_OBJECTS",
@@ -1021,8 +1009,6 @@ static struct pnfs_layoutdriver_type objlayout_type = {
 
 	.encode_layoutcommit	 = objlayout_encode_layoutcommit,
 	.encode_layoutreturn     = objlayout_encode_layoutreturn,
-
-	.pg_test		= objlayout_pg_test,
 };
 
 void *objio_init_mt(void)

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 20/38] pnfs: per mount layout driver private data
  2011-05-23  4:38   ` Boaz Harrosh
@ 2011-05-23 13:36     ` Benny Halevy
  0 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-23 13:36 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Trond Myklebust, linux-nfs

On 2011-05-23 07:38, Boaz Harrosh wrote:
> On 05/23/2011 02:57 AM, Benny Halevy wrote:
>> With the objects layout security model we have object capabilities
>> that are associated with the layout ad we anticipate that the server
>> will issue a cb_layoutrecall for any setattr that changes security
>> related attributes (user/group/mode/acl) or truncates the file.
>> Therefore, the client returns the layout in advance to avoid the
>> extra layout recall.
>>
> 
> This looks like the wrong text. It belongs to that other patch.

Hmm, looks like a pilot error... thanks!

Benny

> 
> The title and actual patch do match
> 
> Boaz
>> [get rid of ds_[rw]size]
>> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
>> ---
>>  include/linux/nfs_fs_sb.h |    1 +
>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
>> index 87694ca..66e031f 100644
>> --- a/include/linux/nfs_fs_sb.h
>> +++ b/include/linux/nfs_fs_sb.h
>> @@ -143,6 +143,7 @@ struct nfs_server {
>>  						   filesystem */
>>  	struct pnfs_layoutdriver_type  *pnfs_curr_ld; /* Active layout driver */
>>  	struct rpc_wait_queue	roc_rpcwaitq;
>> +	void			       *pnfs_ld_data; /* Per-mount data */
>>  
>>  	/* the following fields are protected by nfs_client->cl_lock */
>>  	struct rb_root		state_owners;
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 23/38] SQUASHME: pnfs-obj: use global device cache
  2011-05-23  4:52   ` Boaz Harrosh
@ 2011-05-23 13:44     ` Benny Halevy
  2011-05-23 20:53       ` Boaz Harrosh
  0 siblings, 1 reply; 51+ messages in thread
From: Benny Halevy @ 2011-05-23 13:44 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Trond Myklebust, linux-nfs

On 2011-05-23 07:52, Boaz Harrosh wrote:
> On 05/23/2011 02:58 AM, Benny Halevy wrote:
>> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> 
> Benny sorry but NACK on the global device cache for now
> 
> This is to late at this stage. We have decided that first imp will
> use the private cache and we'll postpone these cleanups for later.
> 
> All other code was well tested for years, all this is new code, and 

The file layout is upstream and better be harnessed for other layout
drivers as well.  If it's inferior to the current objects layout cache
we should fix and improve the former rather than introducing a new
implementation.

> new behaviour that we will not have time to test. I do not like the

Ideally, the should already be fully tested, but last minute review-
related changes will always require further testing that needs to be
taken place during the -rc cycle.  The whole point of having rc's is
to stabilize the merged code to a point it can be released as a stable
release.

> code as it is. Because currently it will release the device on layout_return.
> Where is the cache? There is much more work to do here!
> 

Like I said, if there are bugs we should fix them rather introducing
alternative code that does the same thing.

> We already said not to do this in this merge why the change of heart?

We discussed that again on Thursday's conference call which you did not
attend.  I decided to take a stab at it to see how a unified cache would
look like and I rather like the outcome..

Benny

> 
> Boaz
> 
>> ---
>>  fs/nfs/objlayout/objio_osd.c |  102 ++++++++++++++++++++----------------------
>>  1 files changed, 49 insertions(+), 53 deletions(-)
>>
>> diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
>> index 752bf7a..bcc8468 100644
>> --- a/fs/nfs/objlayout/objio_osd.c
>> +++ b/fs/nfs/objlayout/objio_osd.c
>> @@ -46,66 +46,55 @@
>>  
>>  #define _LLU(x) ((unsigned long long)x)
>>  
>> -/* A per mountpoint struct currently for device cache */
>> -struct objio_mount_type {
>> -	struct list_head dev_list;
>> -	spinlock_t dev_list_lock;
>> -};
>> -
>> -struct _dev_ent {
>> -	struct list_head list;
>> -	struct nfs4_deviceid d_id;
>> +struct objio_dev_ent {
>> +	struct nfs4_deviceid_node id_node;
>>  	struct osd_dev *od;
>>  };
>>  
>> -static struct osd_dev *___dev_list_find(struct objio_mount_type *omt,
>> -	struct nfs4_deviceid *d_id)
>> +static void
>> +objio_free_deviceid_node(struct nfs4_deviceid_node *d)
>>  {
>> -	struct list_head *le;
>> +	struct objio_dev_ent *de = container_of(d, struct objio_dev_ent, id_node);
>>  
>> -	list_for_each(le, &omt->dev_list) {
>> -		struct _dev_ent *de = list_entry(le, struct _dev_ent, list);
>> -
>> -		if (0 == memcmp(&de->d_id, d_id, sizeof(*d_id)))
>> -			return de->od;
>> -	}
>> -
>> -	return NULL;
>> +	osduld_put_device(de->od);
>> +	kfree(de);
>>  }
>>  
>> -static struct osd_dev *_dev_list_find(struct objio_mount_type *omt,
>> -	struct nfs4_deviceid *d_id)
>> +static struct objio_dev_ent *_dev_list_find(const struct nfs_client *clp,
>> +	const struct nfs4_deviceid *d_id)
>>  {
>> -	struct osd_dev *od;
>> +	struct nfs4_deviceid_node *d;
>>  
>> -	spin_lock(&omt->dev_list_lock);
>> -	od = ___dev_list_find(omt, d_id);
>> -	spin_unlock(&omt->dev_list_lock);
>> -	return od;
>> +	d = nfs4_find_get_deviceid(clp, d_id);
>> +	if (!d)
>> +		return NULL;
>> +	return container_of(d, struct objio_dev_ent, id_node);
>>  }
>>  
>> -static int _dev_list_add(struct objio_mount_type *omt,
>> -	struct nfs4_deviceid *d_id, struct osd_dev *od,
>> +static int _dev_list_add(const struct nfs_server *nfss,
>> +	const struct nfs4_deviceid *d_id, struct osd_dev *od,
>>  	gfp_t gfp_flags)
>>  {
>> -	struct _dev_ent *de = kzalloc(sizeof(*de), gfp_flags);
>> +	struct nfs4_deviceid_node *d;
>> +	struct objio_dev_ent *de = kzalloc(sizeof(*de), gfp_flags);
>> +	struct objio_dev_ent *n;
>>  
>>  	if (!de)
>>  		return -ENOMEM;
>>  
>> -	spin_lock(&omt->dev_list_lock);
>> +	nfs4_init_deviceid_node(&de->id_node,
>> +				nfss->pnfs_curr_ld,
>> +				nfss->nfs_client,
>> +				d_id);
>> +	de->od = od;
>>  
>> -	if (___dev_list_find(omt, d_id)) {
>> -		kfree(de);
>> -		goto out;
>> +	d = nfs4_insert_deviceid_node(&de->id_node);
>> +	n = container_of(d, struct objio_dev_ent, id_node);
>> +	if (n != de) {
>> +		BUG_ON(n->od != od);
>> +		objio_free_deviceid_node(&de->id_node);
>>  	}
>>  
>> -	de->d_id = *d_id;
>> -	de->od = od;
>> -	list_add(&de->list, &omt->dev_list);
>> -
>> -out:
>> -	spin_unlock(&omt->dev_list_lock);
>>  	return 0;
>>  }
>>  
>> @@ -128,7 +117,7 @@ struct objio_segment {
>>  	unsigned comps_index;
>>  	unsigned num_comps;
>>  	/* variable length */
>> -	struct osd_dev	*ods[1];
>> +	struct objio_dev_ent *ods[1];
>>  };
>>  
>>  static inline struct objio_segment *
>> @@ -139,23 +128,22 @@ OBJIO_LSEG(struct pnfs_layout_segment *lseg)
>>  
>>  /* Send and wait for a get_device_info of devices in the layout,
>>     then look them up with the osd_initiator library */
>> -static struct osd_dev *_device_lookup(struct pnfs_layout_hdr *pnfslay,
>> +static struct objio_dev_ent *_device_lookup(struct pnfs_layout_hdr *pnfslay,
>>  				struct objio_segment *objio_seg, unsigned comp,
>>  				gfp_t gfp_flags)
>>  {
>>  	struct pnfs_osd_deviceaddr *deviceaddr;
>>  	struct nfs4_deviceid *d_id;
>> +	struct objio_dev_ent *ode;
>>  	struct osd_dev *od;
>>  	struct osd_dev_info odi;
>> -	struct objio_mount_type *omt =
>> -				   NFS_SERVER(pnfslay->plh_inode)->pnfs_ld_data;
>>  	int err;
>>  
>>  	d_id = &objio_seg->comps[comp].oc_object_id.oid_device_id;
>>  
>> -	od = _dev_list_find(omt, d_id);
>> -	if (od)
>> -		return od;
>> +	ode = _dev_list_find(NFS_SERVER(pnfslay->plh_inode)->nfs_client, d_id);
>> +	if (ode)
>> +		return ode;
>>  
>>  	err = objlayout_get_deviceinfo(pnfslay, d_id, &deviceaddr, gfp_flags);
>>  	if (unlikely(err)) {
>> @@ -188,7 +176,7 @@ static struct osd_dev *_device_lookup(struct pnfs_layout_hdr *pnfslay,
>>  		goto out;
>>  	}
>>  
>> -	_dev_list_add(omt, d_id, od, gfp_flags);
>> +	_dev_list_add(NFS_SERVER(pnfslay->plh_inode), d_id, od, gfp_flags);
>>  
>>  out:
>>  	dprintk("%s: return=%d\n", __func__, err);
>> @@ -205,14 +193,14 @@ static int objio_devices_lookup(struct pnfs_layout_hdr *pnfslay,
>>  
>>  	/* lookup all devices */
>>  	for (i = 0; i < objio_seg->num_comps; i++) {
>> -		struct osd_dev *od;
>> +		struct objio_dev_ent *ode;
>>  
>> -		od = _device_lookup(pnfslay, objio_seg, i, gfp_flags);
>> -		if (unlikely(IS_ERR(od))) {
>> -			err = PTR_ERR(od);
>> +		ode = _device_lookup(pnfslay, objio_seg, i, gfp_flags);
>> +		if (unlikely(IS_ERR(ode))) {
>> +			err = PTR_ERR(ode);
>>  			goto out;
>>  		}
>> -		objio_seg->ods[i] = od;
>> +		objio_seg->ods[i] = ode;
>>  	}
>>  	err = 0;
>>  
>> @@ -348,8 +336,14 @@ err:
>>  
>>  void objio_free_lseg(struct pnfs_layout_segment *lseg)
>>  {
>> +	int i;
>>  	struct objio_segment *objio_seg = OBJIO_LSEG(lseg);
>>  
>> +	for (i = 0; i < objio_seg->num_comps; i++) {
>> +		if (!objio_seg->ods[i])
>> +			break;
>> +		nfs4_put_deviceid_node(&objio_seg->ods[i]->id_node);
>> +	}
>>  	kfree(objio_seg);
>>  }
>>  
>> @@ -360,6 +354,8 @@ static struct pnfs_layoutdriver_type objlayout_type = {
>>  
>>  	.alloc_lseg              = objlayout_alloc_lseg,
>>  	.free_lseg               = objlayout_free_lseg,
>> +
>> +	.free_deviceid_node	 = objio_free_deviceid_node,
>>  };
>>  
>>  MODULE_DESCRIPTION("pNFS Layout Driver for OSD2 objects");
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 30/38] SQUASHME: initialize data->task on the non-rpc io done success paths
  2011-05-23  4:58   ` Boaz Harrosh
@ 2011-05-23 13:47     ` Benny Halevy
  0 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-23 13:47 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Trond Myklebust, linux-nfs

On 2011-05-23 07:58, Boaz Harrosh wrote:
> On 05/23/2011 03:01 AM, Benny Halevy wrote:
>> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
>> ---
>>  fs/nfs/pnfs.c |    2 ++
>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index 0f59802..d39fcca 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -1064,6 +1064,7 @@ pnfs_ld_write_done(struct nfs_write_data *data)
>>  
>>  	if (!data->pnfs_error) {
>>  		pnfs_set_layoutcommit(data);
>> +		memset(&data->task, 0, sizeof(data->task));
> 
> What?
> We used this data task to come here. See: objlayout.c::objlayout_read/write_done()
> why do you think it is invalid?
>

You're right.
Apparently I got too paranoid yesterday night.
The initialization is in nfs_{read,write}data_alloc

Thanks!

Benny

>>  		data->mds_ops->rpc_call_done(&data->task, data);
>>  		data->mds_ops->rpc_release(data);
>>  		return 0;
>> @@ -1113,6 +1114,7 @@ pnfs_ld_read_done(struct nfs_read_data *data)
>>  
>>  	if (!data->pnfs_error) {
>>  		__nfs4_read_done_cb(data);
>> +		memset(&data->task, 0, sizeof(data->task));
> 
> Same here this is called on the task thread
> 
>>  		data->mds_ops->rpc_call_done(&data->task, data);
>>  		data->mds_ops->rpc_release(data);
>>  		return 0;
> 
> Boaz


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PTACH] SQUASHME: pnfs-obj: Important fallout from the last rebase
  2011-05-23 10:44   ` [PTACH] SQUASHME: pnfs-obj: Important fallout from the last rebase Boaz Harrosh
@ 2011-05-23 13:53     ` Benny Halevy
  0 siblings, 0 replies; 51+ messages in thread
From: Benny Halevy @ 2011-05-23 13:53 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Trond Myklebust, linux-nfs

On 2011-05-23 13:44, Boaz Harrosh wrote:
> 
> On the last rebase I dropped this hunk it needs to go into  the:
> 	[PATCH v5 31/38] pnfs-obj: osd raid engine read/write implementation
> 
> With out this the code does not work

I agree that in the current implementation the ld must implement
pg_test for pnfs to work.
This patch is reverse though...

Benny

> 
> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
> ---
> diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
> index 105d8a6..74b5d3f 100644
> --- a/fs/nfs/objlayout/objio_osd.c
> +++ b/fs/nfs/objlayout/objio_osd.c
> @@ -989,19 +990,6 @@ ssize_t objio_write_pagelist(struct objlayout_io_state *ol_state, bool stable)
>  	return _write_exec(ios);
>  }
>  
> -/*
> - * objlayout_pg_test(). Called by nfs_can_coalesce_requests()
> - *
> - * return 1 :  coalesce page
> - * return 0 :  don't coalesce page
> - */
> -int
> -objlayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
> -		   struct nfs_page *req)
> -{
> -	return 1;
> -}
> -
>  static struct pnfs_layoutdriver_type objlayout_type = {
>  	.id = LAYOUT_OSD2_OBJECTS,
>  	.name = "LAYOUT_OSD2_OBJECTS",
> @@ -1021,8 +1009,6 @@ static struct pnfs_layoutdriver_type objlayout_type = {
>  
>  	.encode_layoutcommit	 = objlayout_encode_layoutcommit,
>  	.encode_layoutreturn     = objlayout_encode_layoutreturn,
> -
> -	.pg_test		= objlayout_pg_test,
>  };
>  
>  void *objio_init_mt(void)


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 23/38] SQUASHME: pnfs-obj: use global device cache
  2011-05-23 13:44     ` Benny Halevy
@ 2011-05-23 20:53       ` Boaz Harrosh
  2011-05-23 21:59         ` [PATCH] SQUASHME: Bugs in new global-device-cache code Boaz Harrosh
  0 siblings, 1 reply; 51+ messages in thread
From: Boaz Harrosh @ 2011-05-23 20:53 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Trond Myklebust, linux-nfs

On 05/23/2011 04:44 PM, Benny Halevy wrote:
> On 2011-05-23 07:52, Boaz Harrosh wrote:
>> On 05/23/2011 02:58 AM, Benny Halevy wrote:
>>> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
>>
>> Benny sorry but NACK on the global device cache for now
>>
>> This is to late at this stage. We have decided that first imp will
>> use the private cache and we'll postpone these cleanups for later.
>>
>> All other code was well tested for years, all this is new code, and 
> 
> The file layout is upstream and better be harnessed for other layout
> drivers as well.  If it's inferior to the current objects layout cache
> we should fix and improve the former rather than introducing a new
> implementation.
> 
>> new behaviour that we will not have time to test. I do not like the
> 
> Ideally, the should already be fully tested, but last minute review-
> related changes will always require further testing that needs to be
> taken place during the -rc cycle.  The whole point of having rc's is
> to stabilize the merged code to a point it can be released as a stable
> release.
> 
>> code as it is. Because currently it will release the device on layout_return.
>> Where is the cache? There is much more work to do here!
>>
> 
> Like I said, if there are bugs we should fix them rather introducing
> alternative code that does the same thing.
> 
>> We already said not to do this in this merge why the change of heart?
> 
> We discussed that again on Thursday's conference call which you did not
> attend.  I decided to take a stab at it to see how a unified cache would
> look like and I rather like the outcome..
> 
> Benny
> 

As I suspected this is crashing all over the place. Finally I was  able
to rebase a server on your latest code. (And find out what I was missing).

I'm fighting with it for two hours now. And I don't see the end of it.

Again I'm not at all against this work. I have my own version of this
in my tree. Only against the timing. Your latest code cannot go in now.
It is to new!

What do you want to do now? Postpone pnfs-objects to next Kernel. Or put
in good solid code that was tested and worked for a long time.

I'm not sleeping nights, have worked all through both weekends. Now it is
for nothing. Thanks alot!

Boaz

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH] SQUASHME: Bugs in new global-device-cache code
  2011-05-23 20:53       ` Boaz Harrosh
@ 2011-05-23 21:59         ` Boaz Harrosh
  2011-05-23 22:31           ` Boaz Harrosh
  0 siblings, 1 reply; 51+ messages in thread
From: Boaz Harrosh @ 2011-05-23 21:59 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Trond Myklebust, linux-nfs


With this I'm able to do IO.

It is ontop of benney's code and my BUGs fixes
 + a merge fallout, between the two codes.

Current Benny's top + all the fixes has two problems.
1. Very small IOs both reads and writes
  How/where to set rsize/wsize
2. Something funny I'm still investigating. When I do
   a small Io couple of  requests the devices get freed
   at the end on the release of the layout. (Which is not
   layout_returned ever)
   But when I do very large IO and lots of concurrent requests
   the devices do not get to be released at all, they stay in
   cache. But am still investigating

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 fs/nfs/objlayout/objio_osd.c |   31 ++++++++++++++++++++++---------
 fs/nfs/pnfs_dev.c            |    2 +-
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 5210913..83aa088 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -60,6 +60,7 @@ objio_free_deviceid_node(struct nfs4_deviceid_node *d)
 {
 	struct objio_dev_ent *de = container_of(d, struct objio_dev_ent, id_node);
 
+	dprintk("%s: free od=%p\n", __func__, de->od);
 	osduld_put_device(de->od);
 	kfree(de);
 }
@@ -68,14 +69,19 @@ static struct objio_dev_ent *_dev_list_find(const struct nfs_server *nfss,
 	const struct nfs4_deviceid *d_id)
 {
 	struct nfs4_deviceid_node *d;
+	struct objio_dev_ent *de;
 
 	d = nfs4_find_get_deviceid(nfss->pnfs_curr_ld, nfss->nfs_client, d_id);
 	if (!d)
 		return NULL;
-	return container_of(d, struct objio_dev_ent, id_node);
+
+	de = container_of(d, struct objio_dev_ent, id_node);
+	dprintk("%s: found od=%p\n", __func__, de->od);
+	return de;
 }
 
-static int _dev_list_add(const struct nfs_server *nfss,
+static struct objio_dev_ent *
+_dev_list_add(const struct nfs_server *nfss,
 	const struct nfs4_deviceid *d_id, struct osd_dev *od,
 	gfp_t gfp_flags)
 {
@@ -83,9 +89,12 @@ static int _dev_list_add(const struct nfs_server *nfss,
 	struct objio_dev_ent *de = kzalloc(sizeof(*de), gfp_flags);
 	struct objio_dev_ent *n;
 
-	if (!de)
-		return -ENOMEM;
+	if (!de) {
+		dprintk("%s: -ENOMEM od=%p\n", __func__, od);
+		return NULL;
+	}
 
+	dprintk("%s: Adding od=%p\n", __func__, od);
 	nfs4_init_deviceid_node(&de->id_node,
 				nfss->pnfs_curr_ld,
 				nfss->nfs_client,
@@ -95,11 +104,13 @@ static int _dev_list_add(const struct nfs_server *nfss,
 	d = nfs4_insert_deviceid_node(&de->id_node);
 	n = container_of(d, struct objio_dev_ent, id_node);
 	if (n != de) {
-		BUG_ON(n->od != od);
+/*		BUG_ON(n->od != od);*/
+		dprintk("%s: Race with other n->od=%p\n", __func__, n->od);
 		objio_free_deviceid_node(&de->id_node);
+		de = n;
 	}
 
-	return 0;
+	return de;
 }
 
 struct caps_buffers {
@@ -121,7 +132,7 @@ struct objio_segment {
 	unsigned comps_index;
 	unsigned num_comps;
 	/* variable length */
-	struct objio_dev_ent *ods[0];
+	struct objio_dev_ent *ods[];
 };
 
 static inline struct objio_segment *
@@ -205,12 +216,13 @@ static struct objio_dev_ent *_device_lookup(struct pnfs_layout_hdr *pnfslay,
 		goto out;
 	}
 
-	_dev_list_add(NFS_SERVER(pnfslay->plh_inode), d_id, od, gfp_flags);
+	ode = _dev_list_add(NFS_SERVER(pnfslay->plh_inode), d_id, od,
+			    gfp_flags);
 
 out:
 	dprintk("%s: return=%d\n", __func__, err);
 	objlayout_put_deviceinfo(deviceaddr);
-	return err ? ERR_PTR(err) : od;
+	return err ? ERR_PTR(err) : ode;
 }
 
 static int objio_devices_lookup(struct pnfs_layout_hdr *pnfslay,
@@ -230,6 +242,7 @@ static int objio_devices_lookup(struct pnfs_layout_hdr *pnfslay,
 			goto out;
 		}
 		objio_seg->ods[i] = ode;
+		dprintk("%s: ods[%d] = %p\n", __func__, i, ode->od);
 	}
 	err = 0;
 
diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
index 7997899..7e5542c 100644
--- a/fs/nfs/pnfs_dev.c
+++ b/fs/nfs/pnfs_dev.c
@@ -100,7 +100,7 @@ _find_get_deviceid(const struct pnfs_layoutdriver_type *ld,
 
 	rcu_read_lock();
 	d = _lookup_deviceid(ld, clp, id, hash);
-	if (!atomic_inc_not_zero(&d->ref))
+	if (!d || !atomic_inc_not_zero(&d->ref))
 		d = NULL;
 	rcu_read_unlock();
 	return d;
-- 
1.7.2.3



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH] SQUASHME: Bugs in new global-device-cache code
  2011-05-23 21:59         ` [PATCH] SQUASHME: Bugs in new global-device-cache code Boaz Harrosh
@ 2011-05-23 22:31           ` Boaz Harrosh
  0 siblings, 0 replies; 51+ messages in thread
From: Boaz Harrosh @ 2011-05-23 22:31 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Trond Myklebust, linux-nfs

On 05/24/2011 12:59 AM, Boaz Harrosh wrote:
> 2. Something funny I'm still investigating. When I do
>    a small Io couple of  requests the devices get freed
>    at the end on the release of the layout. (Which is not
>    layout_returned ever)
>    But when I do very large IO and lots of concurrent requests
>    the devices do not get to be released at all, they stay in
>    cache. But am still investigating

OK That was a stupid print problem. The buffer was full and the prints
did not show.

It's fine, I guess I need to sleep ;-)

Boaz

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2011-05-23 22:31 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-22 23:43 [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy
2011-05-22 23:45 ` [PATCH v5 01/38] NFSv4.1: use struct nfs_client to qualify deviceid Benny Halevy
2011-05-22 23:45 ` [PATCH v5 02/38] pnfs: resolve header dependency in pnfs.h Benny Halevy
2011-05-22 23:45 ` [PATCH v5 03/38] NFSv4.1: make deviceid cache global Benny Halevy
2011-05-22 23:46 ` [PATCH v5 04/38] NFSv4.1: purge deviceid cache on nfs_free_client Benny Halevy
2011-05-22 23:46 ` [PATCH v5 05/38] pnfs: CB_NOTIFY_DEVICEID Benny Halevy
2011-05-22 23:46 ` [PATCH v5 06/38] SQUASHME: use be32 res in nfs4_callback_devicenotify Benny Halevy
2011-05-22 23:47 ` [PATCH v5 07/38] SQUASHME: pnfs: use nfs_client to qualify deviceid for cb_notify_deviceid Benny Halevy
2011-05-22 23:47 ` [PATCH v5 08/38] SQUASHME: pnfs: use global deviceid cache for CB_NOTIFY_DEVICEID Benny Halevy
2011-05-22 23:48 ` [PATCH v5 09/38] SQUASHME: pnfs: refactor device cache _lookup_deviceid Benny Halevy
2011-05-22 23:49 ` [PATCH v5 10/38] SQUASHME: pnfs: refactor device cache _find_get_deviceid Benny Halevy
2011-05-22 23:50 ` [PATCH v5 11/38] SUNRPC: introduce xdr_init_decode_pages Benny Halevy
2011-05-22 23:51 ` [PATCH v5 12/38] pnfs: Use byte-range for layoutget Benny Halevy
2011-05-22 23:51 ` [PATCH v5 13/38] pnfs: align layoutget requests on page boundaries Benny Halevy
2011-05-22 23:52 ` [PATCH v5 14/38] pnfs: Use byte-range for cb_layoutrecall Benny Halevy
2011-05-22 23:53 ` [PATCH v5 15/38] pnfs: client stats Benny Halevy
2011-05-22 23:54 ` [PATCH v5 16/38] pnfs-obj: objlayoutdriver module skeleton Benny Halevy
2011-05-22 23:55 ` [PATCH v5 17/38] pnfs-obj: pnfs_osd XDR definitions Benny Halevy
2011-05-22 23:56 ` [PATCH v5 18/38] pnfs-obj: pnfs_osd XDR client implementation Benny Halevy
2011-05-22 23:57 ` [PATCH v5 19/38] pnfs-obj: decode layout, alloc/free lseg Benny Halevy
2011-05-22 23:57 ` [PATCH v5 20/38] pnfs: per mount layout driver private data Benny Halevy
2011-05-23  4:38   ` Boaz Harrosh
2011-05-23 13:36     ` Benny Halevy
2011-05-22 23:57 ` [PATCH v5 21/38] pnfs-obj: objio_osd device information retrieval and caching Benny Halevy
2011-05-22 23:58 ` [PATCH v5 22/38] pnfs: set/unset layoutdriver Benny Halevy
2011-05-22 23:58 ` [PATCH v5 23/38] SQUASHME: pnfs-obj: use global device cache Benny Halevy
2011-05-23  4:52   ` Boaz Harrosh
2011-05-23 13:44     ` Benny Halevy
2011-05-23 20:53       ` Boaz Harrosh
2011-05-23 21:59         ` [PATCH] SQUASHME: Bugs in new global-device-cache code Boaz Harrosh
2011-05-23 22:31           ` Boaz Harrosh
2011-05-22 23:59 ` [PATCH v5 24/38] SQUASHME: Revert "pnfs: per mount layout driver private data" Benny Halevy
2011-05-22 23:59 ` [PATCH v5 25/38] SQUASHME: Revert "pnfs: set/unset layoutdriver" Benny Halevy
2011-05-22 23:59 ` [PATCH v5 26/38] NFSv4.1: use layout driver in global device cache Benny Halevy
2011-05-23  0:00 ` [PATCH v5 27/38] pnfs: alloc and free layout_hdr layoutdriver methods Benny Halevy
2011-05-23  0:00 ` [PATCH v5 28/38] pnfs-obj: define per-inode private structure Benny Halevy
2011-05-23  0:00 ` [PATCH v5 29/38] pnfs: support for non-rpc layout drivers Benny Halevy
2011-05-23  0:01 ` [PATCH v5 30/38] SQUASHME: initialize data->task on the non-rpc io done success paths Benny Halevy
2011-05-23  4:58   ` Boaz Harrosh
2011-05-23 13:47     ` Benny Halevy
2011-05-23  0:01 ` [PATCH v5 31/38] pnfs-obj: osd raid engine read/write implementation Benny Halevy
2011-05-23 10:44   ` [PTACH] SQUASHME: pnfs-obj: Important fallout from the last rebase Boaz Harrosh
2011-05-23 13:53     ` Benny Halevy
2011-05-23  0:01 ` [PATCH v5 32/38] pnfs: layoutreturn Benny Halevy
2011-05-23  0:02 ` [PATCH v5 33/38] SQUASHME: pnfs: fix layout stateid in layoutreturn args Benny Halevy
2011-05-23  0:02 ` [PATCH v5 34/38] pnfs: layoutret_on_setattr Benny Halevy
2011-05-23  0:02 ` [PATCH v5 35/38] pnfs: encode_layoutreturn Benny Halevy
2011-05-23  0:02 ` [PATCH v5 36/38] pnfs-obj: report errors and .encode_layoutreturn Implementation Benny Halevy
2011-05-23  0:03 ` [PATCH v5 37/38] pnfs: encode_layoutcommit Benny Halevy
2011-05-23  0:03 ` [PATCH v5 38/38] pnfs-obj: objlayout_encode_layoutcommit implementation Benny Halevy
2011-05-23  0:22 ` [PATCHSET v5 0/38] pnfs for 2.6.40 Benny Halevy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.