All of lore.kernel.org
 help / color / mirror / Atom feed
* nfs: desynchronized value of nfs_i.ncommit.
@ 2007-04-17 17:01             ` OGAWA Hirofumi
  2007-04-17 22:44               ` Trond Myklebust
  2007-04-18  1:19               ` [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues Trond Myklebust
  0 siblings, 2 replies; 43+ messages in thread
From: OGAWA Hirofumi @ 2007-04-17 17:01 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: linux-kernel

Hi,

I've got the following message today. Probably, it happened on heavy load.

NFS: desynchronized value of nfs_i.ncommit.
NFS: desynchronized value of nfs_i.ncommit.
NFS: desynchronized value of nfs_i.ncommit.
NFS: desynchronized value of nfs_i.ncommit.
NFS: desynchronized value of nfs_i.ncommit.
NFS: desynchronized value of nfs_i.ncommit.
NFS: desynchronized value of nfs_i.ncommit.
[...]

Any idea? Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: nfs: desynchronized value of nfs_i.ncommit.
  2007-04-17 17:01             ` nfs: desynchronized value of nfs_i.ncommit OGAWA Hirofumi
@ 2007-04-17 22:44               ` Trond Myklebust
  2007-04-18  1:19               ` [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues Trond Myklebust
  1 sibling, 0 replies; 43+ messages in thread
From: Trond Myklebust @ 2007-04-17 22:44 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: linux-kernel

On Wed, 2007-04-18 at 02:01 +0900, OGAWA Hirofumi wrote:
> Hi,
> 
> I've got the following message today. Probably, it happened on heavy load.
> 
> NFS: desynchronized value of nfs_i.ncommit.
> NFS: desynchronized value of nfs_i.ncommit.
> NFS: desynchronized value of nfs_i.ncommit.
> NFS: desynchronized value of nfs_i.ncommit.
> NFS: desynchronized value of nfs_i.ncommit.
> NFS: desynchronized value of nfs_i.ncommit.
> NFS: desynchronized value of nfs_i.ncommit.
> [...]
> 
> Any idea? Thanks.

It is a known issue that is due to nfs_writepage() 'stealing' requests
from the commit list. The most dangerous consequence is that it screws
up the NR_UNSTABLE_NFS page accounting...

I'm working on a fix, and will post it to lkml as soon as I'm happy with
tests.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-17 17:01             ` nfs: desynchronized value of nfs_i.ncommit OGAWA Hirofumi
  2007-04-17 22:44               ` Trond Myklebust
@ 2007-04-18  1:19               ` Trond Myklebust
  2007-04-18  1:29                 ` [PATCH 1/4] NFS: clean up the unstable write code Trond Myklebust
                                   ` (5 more replies)
  1 sibling, 6 replies; 43+ messages in thread
From: Trond Myklebust @ 2007-04-18  1:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Florin Iucha, Andrew Morton, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

I've split the issues introduced by the 2.6.21-rcX write code up into 4
subproblems.

The first patch is just a cleanup in order to ease review.

Patch number 2 ensures that we never release the PG_writeback flag until
_after_ we've either discarded the unstable request altogether, or put it
on the nfs_inode's commit or dirty lists.

Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
uses the PG_NEED_COMMIT flag as an indicator for whether or not the request
may be redirtied.

Patch number 4 protects the NFS '.set_page_dirty' address_space operation
against races with nfs_inode_add_request.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 1/4] NFS: clean up the unstable write code
  2007-04-18  1:19               ` [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues Trond Myklebust
@ 2007-04-18  1:29                 ` Trond Myklebust
  2007-04-18  1:29                 ` [PATCH 2/4] NFS: Don't clear PG_writeback until after we've processed unstable writes Trond Myklebust
                                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 43+ messages in thread
From: Trond Myklebust @ 2007-04-18  1:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Florin Iucha, Andrew Morton, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Get rid of the inlined #ifdefs.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/write.c           |  117 ++++++++++++++++++++++++++++------------------
 include/linux/nfs_page.h |   30 ------------
 2 files changed, 71 insertions(+), 76 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ad2e91b..3ed4feb 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -460,6 +460,43 @@ nfs_mark_request_commit(struct nfs_page *req)
 	inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
 	__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
 }
+
+static inline
+int nfs_write_need_commit(struct nfs_write_data *data)
+{
+	return data->verf.committed != NFS_FILE_SYNC;
+}
+
+static inline
+int nfs_reschedule_unstable_write(struct nfs_page *req)
+{
+	if (test_and_clear_bit(PG_NEED_COMMIT, &req->wb_flags)) {
+		nfs_mark_request_commit(req);
+		return 1;
+	}
+	if (test_and_clear_bit(PG_NEED_RESCHED, &req->wb_flags)) {
+		nfs_redirty_request(req);
+		return 1;
+	}
+	return 0;
+}
+#else
+static inline void
+nfs_mark_request_commit(struct nfs_page *req)
+{
+}
+
+static inline
+int nfs_write_need_commit(struct nfs_write_data *data)
+{
+	return 0;
+}
+
+static inline
+int nfs_reschedule_unstable_write(struct nfs_page *req)
+{
+	return 0;
+}
 #endif
 
 /*
@@ -746,26 +783,12 @@ int nfs_updatepage(struct file *file, struct page *page,
 
 static void nfs_writepage_release(struct nfs_page *req)
 {
-	nfs_end_page_writeback(req->wb_page);
 
-#if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
-	if (!PageError(req->wb_page)) {
-		if (NFS_NEED_RESCHED(req)) {
-			nfs_redirty_request(req);
-			goto out;
-		} else if (NFS_NEED_COMMIT(req)) {
-			nfs_mark_request_commit(req);
-			goto out;
-		}
-	}
-	nfs_inode_remove_request(req);
-
-out:
-	nfs_clear_commit(req);
-	nfs_clear_reschedule(req);
-#else
-	nfs_inode_remove_request(req);
-#endif
+	if (PageError(req->wb_page) || !nfs_reschedule_unstable_write(req)) {
+		nfs_end_page_writeback(req->wb_page);
+		nfs_inode_remove_request(req);
+	} else
+		nfs_end_page_writeback(req->wb_page);
 	nfs_clear_page_writeback(req);
 }
 
@@ -1008,22 +1031,28 @@ static void nfs_writeback_done_partial(struct rpc_task *task, void *calldata)
 		nfs_set_pageerror(page);
 		req->wb_context->error = task->tk_status;
 		dprintk(", error = %d\n", task->tk_status);
-	} else {
-#if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
-		if (data->verf.committed < NFS_FILE_SYNC) {
-			if (!NFS_NEED_COMMIT(req)) {
-				nfs_defer_commit(req);
-				memcpy(&req->wb_verf, &data->verf, sizeof(req->wb_verf));
-				dprintk(" defer commit\n");
-			} else if (memcmp(&req->wb_verf, &data->verf, sizeof(req->wb_verf))) {
-				nfs_defer_reschedule(req);
-				dprintk(" server reboot detected\n");
-			}
-		} else
-#endif
-			dprintk(" OK\n");
+		goto out;
 	}
 
+	if (nfs_write_need_commit(data)) {
+		spinlock_t *req_lock = &NFS_I(page->mapping->host)->req_lock;
+
+		spin_lock(req_lock);
+		if (test_bit(PG_NEED_RESCHED, &req->wb_flags)) {
+			/* Do nothing we need to resend the writes */
+		} else if (!test_and_set_bit(PG_NEED_COMMIT, &req->wb_flags)) {
+			memcpy(&req->wb_verf, &data->verf, sizeof(req->wb_verf));
+			dprintk(" defer commit\n");
+		} else if (memcmp(&req->wb_verf, &data->verf, sizeof(req->wb_verf))) {
+			set_bit(PG_NEED_RESCHED, &req->wb_flags);
+			clear_bit(PG_NEED_COMMIT, &req->wb_flags);
+			dprintk(" server reboot detected\n");
+		}
+		spin_unlock(req_lock);
+	} else
+		dprintk(" OK\n");
+
+out:
 	if (atomic_dec_and_test(&req->wb_complete))
 		nfs_writepage_release(req);
 }
@@ -1064,25 +1093,21 @@ static void nfs_writeback_done_full(struct rpc_task *task, void *calldata)
 		if (task->tk_status < 0) {
 			nfs_set_pageerror(page);
 			req->wb_context->error = task->tk_status;
-			nfs_end_page_writeback(page);
-			nfs_inode_remove_request(req);
 			dprintk(", error = %d\n", task->tk_status);
-			goto next;
+			goto remove_request;
 		}
-		nfs_end_page_writeback(page);
 
-#if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
-		if (data->args.stable != NFS_UNSTABLE || data->verf.committed == NFS_FILE_SYNC) {
-			nfs_inode_remove_request(req);
-			dprintk(" OK\n");
+		if (nfs_write_need_commit(data)) {
+			memcpy(&req->wb_verf, &data->verf, sizeof(req->wb_verf));
+			nfs_mark_request_commit(req);
+			nfs_end_page_writeback(page);
+			dprintk(" marked for commit\n");
 			goto next;
 		}
-		memcpy(&req->wb_verf, &data->verf, sizeof(req->wb_verf));
-		nfs_mark_request_commit(req);
-		dprintk(" marked for commit\n");
-#else
+		dprintk(" OK\n");
+remove_request:
+		nfs_end_page_writeback(page);
 		nfs_inode_remove_request(req);
-#endif
 	next:
 		nfs_clear_page_writeback(req);
 	}
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index d111be6..16b0266 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -49,8 +49,6 @@ struct nfs_page {
 };
 
 #define NFS_WBACK_BUSY(req)	(test_bit(PG_BUSY,&(req)->wb_flags))
-#define NFS_NEED_COMMIT(req)	(test_bit(PG_NEED_COMMIT,&(req)->wb_flags))
-#define NFS_NEED_RESCHED(req)	(test_bit(PG_NEED_RESCHED,&(req)->wb_flags))
 
 extern	struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
 					    struct inode *inode,
@@ -121,34 +119,6 @@ nfs_list_remove_request(struct nfs_page *req)
 	req->wb_list_head = NULL;
 }
 
-static inline int
-nfs_defer_commit(struct nfs_page *req)
-{
-	return !test_and_set_bit(PG_NEED_COMMIT, &req->wb_flags);
-}
-
-static inline void
-nfs_clear_commit(struct nfs_page *req)
-{
-	smp_mb__before_clear_bit();
-	clear_bit(PG_NEED_COMMIT, &req->wb_flags);
-	smp_mb__after_clear_bit();
-}
-
-static inline int
-nfs_defer_reschedule(struct nfs_page *req)
-{
-	return !test_and_set_bit(PG_NEED_RESCHED, &req->wb_flags);
-}
-
-static inline void
-nfs_clear_reschedule(struct nfs_page *req)
-{
-	smp_mb__before_clear_bit();
-	clear_bit(PG_NEED_RESCHED, &req->wb_flags);
-	smp_mb__after_clear_bit();
-}
-
 static inline struct nfs_page *
 nfs_list_entry(struct list_head *head)
 {

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 2/4] NFS: Don't clear PG_writeback until after we've processed unstable writes
  2007-04-18  1:19               ` [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues Trond Myklebust
  2007-04-18  1:29                 ` [PATCH 1/4] NFS: clean up the unstable write code Trond Myklebust
@ 2007-04-18  1:29                 ` Trond Myklebust
  2007-04-18  1:29                 ` [PATCH 3/4] NFS: Fix the 'desynchronized value of nfs_i.ncommit' error Trond Myklebust
                                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 43+ messages in thread
From: Trond Myklebust @ 2007-04-18  1:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Florin Iucha, Andrew Morton, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Ensure that we don't release the PG_writeback lock until after the page has
either been redirtied, or queued on the nfs_inode 'commit' list.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/write.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 3ed4feb..8e94246 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -920,8 +920,8 @@ out_bad:
 		list_del(&data->pages);
 		nfs_writedata_release(data);
 	}
-	nfs_end_page_writeback(req->wb_page);
 	nfs_redirty_request(req);
+	nfs_end_page_writeback(req->wb_page);
 	nfs_clear_page_writeback(req);
 	return -ENOMEM;
 }
@@ -966,8 +966,8 @@ static int nfs_flush_one(struct inode *inode, struct list_head *head, int how)
 	while (!list_empty(head)) {
 		struct nfs_page *req = nfs_list_entry(head->next);
 		nfs_list_remove_request(req);
-		nfs_end_page_writeback(req->wb_page);
 		nfs_redirty_request(req);
+		nfs_end_page_writeback(req->wb_page);
 		nfs_clear_page_writeback(req);
 	}
 	return -ENOMEM;
@@ -1002,8 +1002,8 @@ out_err:
 	while (!list_empty(head)) {
 		req = nfs_list_entry(head->next);
 		nfs_list_remove_request(req);
-		nfs_end_page_writeback(req->wb_page);
 		nfs_redirty_request(req);
+		nfs_end_page_writeback(req->wb_page);
 		nfs_clear_page_writeback(req);
 	}
 	return error;

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 3/4] NFS: Fix the 'desynchronized value of nfs_i.ncommit' error
  2007-04-18  1:19               ` [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues Trond Myklebust
  2007-04-18  1:29                 ` [PATCH 1/4] NFS: clean up the unstable write code Trond Myklebust
  2007-04-18  1:29                 ` [PATCH 2/4] NFS: Don't clear PG_writeback until after we've processed unstable writes Trond Myklebust
@ 2007-04-18  1:29                 ` Trond Myklebust
  2007-04-18  1:29                 ` [PATCH 4/4] NFS: Fix race in nfs_set_page_dirty Trond Myklebust
                                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 43+ messages in thread
From: Trond Myklebust @ 2007-04-18  1:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Florin Iucha, Andrew Morton, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Redirtying a request that is already marked for commit will screw up the
accounting for NR_UNSTABLE_NFS as well as nfs_i.ncommit.
Ensure that all requests on the commit queue are labelled with the
PG_NEED_COMMIT flag, and avoid moving them onto the dirty list inside
nfs_page_mark_flush().

Also inline nfs_mark_request_dirty() into nfs_page_mark_flush() for
atomicity reasons. Avoid dropping the spinlock until we're done marking the
request in the radix tree and have added it to the ->dirty list.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/write.c |   47 ++++++++++++++++++++++-------------------------
 1 files changed, 22 insertions(+), 25 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 8e94246..ce5b4a9 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -38,7 +38,6 @@
 static struct nfs_page * nfs_update_request(struct nfs_open_context*,
 					    struct page *,
 					    unsigned int, unsigned int);
-static void nfs_mark_request_dirty(struct nfs_page *req);
 static long nfs_flush_mapping(struct address_space *mapping, struct writeback_control *wbc, int how);
 static const struct rpc_call_ops nfs_write_partial_ops;
 static const struct rpc_call_ops nfs_write_full_ops;
@@ -255,7 +254,8 @@ static void nfs_end_page_writeback(struct page *page)
 static int nfs_page_mark_flush(struct page *page)
 {
 	struct nfs_page *req;
-	spinlock_t *req_lock = &NFS_I(page->mapping->host)->req_lock;
+	struct nfs_inode *nfsi = NFS_I(page->mapping->host);
+	spinlock_t *req_lock = &nfsi->req_lock;
 	int ret;
 
 	spin_lock(req_lock);
@@ -279,11 +279,23 @@ static int nfs_page_mark_flush(struct page *page)
 			return ret;
 		spin_lock(req_lock);
 	}
-	spin_unlock(req_lock);
+	if (test_bit(PG_NEED_COMMIT, &req->wb_flags)) {
+		/* This request is marked for commit */
+		spin_unlock(req_lock);
+		nfs_unlock_request(req);
+		return 1;
+	}
 	if (nfs_set_page_writeback(page) == 0) {
 		nfs_list_remove_request(req);
-		nfs_mark_request_dirty(req);
-	}
+		/* add the request to the inode's dirty list. */
+		radix_tree_tag_set(&nfsi->nfs_page_tree,
+				req->wb_index, NFS_PAGE_TAG_DIRTY);
+		nfs_list_add_request(req, &nfsi->dirty);
+		nfsi->ndirty++;
+		spin_unlock(req_lock);
+		__mark_inode_dirty(page->mapping->host, I_DIRTY_PAGES);
+	} else
+		spin_unlock(req_lock);
 	ret = test_bit(PG_NEED_FLUSH, &req->wb_flags);
 	nfs_unlock_request(req);
 	return ret;
@@ -406,24 +418,6 @@ static void nfs_inode_remove_request(struct nfs_page *req)
 	nfs_release_request(req);
 }
 
-/*
- * Add a request to the inode's dirty list.
- */
-static void
-nfs_mark_request_dirty(struct nfs_page *req)
-{
-	struct inode *inode = req->wb_context->dentry->d_inode;
-	struct nfs_inode *nfsi = NFS_I(inode);
-
-	spin_lock(&nfsi->req_lock);
-	radix_tree_tag_set(&nfsi->nfs_page_tree,
-			req->wb_index, NFS_PAGE_TAG_DIRTY);
-	nfs_list_add_request(req, &nfsi->dirty);
-	nfsi->ndirty++;
-	spin_unlock(&nfsi->req_lock);
-	__mark_inode_dirty(inode, I_DIRTY_PAGES);
-}
-
 static void
 nfs_redirty_request(struct nfs_page *req)
 {
@@ -438,7 +432,7 @@ nfs_dirty_request(struct nfs_page *req)
 {
 	struct page *page = req->wb_page;
 
-	if (page == NULL)
+	if (page == NULL || test_bit(PG_NEED_COMMIT, &req->wb_flags))
 		return 0;
 	return !PageWriteback(req->wb_page);
 }
@@ -456,6 +450,7 @@ nfs_mark_request_commit(struct nfs_page *req)
 	spin_lock(&nfsi->req_lock);
 	nfs_list_add_request(req, &nfsi->commit);
 	nfsi->ncommit++;
+	set_bit(PG_NEED_COMMIT, &(req)->wb_flags);
 	spin_unlock(&nfsi->req_lock);
 	inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
 	__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
@@ -470,7 +465,7 @@ int nfs_write_need_commit(struct nfs_write_data *data)
 static inline
 int nfs_reschedule_unstable_write(struct nfs_page *req)
 {
-	if (test_and_clear_bit(PG_NEED_COMMIT, &req->wb_flags)) {
+	if (test_bit(PG_NEED_COMMIT, &req->wb_flags)) {
 		nfs_mark_request_commit(req);
 		return 1;
 	}
@@ -557,6 +552,7 @@ static void nfs_cancel_commit_list(struct list_head *head)
 		req = nfs_list_entry(head->next);
 		dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
 		nfs_list_remove_request(req);
+		clear_bit(PG_NEED_COMMIT, &(req)->wb_flags);
 		nfs_inode_remove_request(req);
 		nfs_unlock_request(req);
 	}
@@ -1295,6 +1291,7 @@ static void nfs_commit_done(struct rpc_task *task, void *calldata)
 	while (!list_empty(&data->pages)) {
 		req = nfs_list_entry(data->pages.next);
 		nfs_list_remove_request(req);
+		clear_bit(PG_NEED_COMMIT, &(req)->wb_flags);
 		dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
 
 		dprintk("NFS: commit (%s/%Ld %d@%Ld)",

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 4/4] NFS: Fix race in nfs_set_page_dirty
  2007-04-18  1:19               ` [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues Trond Myklebust
                                   ` (2 preceding siblings ...)
  2007-04-18  1:29                 ` [PATCH 3/4] NFS: Fix the 'desynchronized value of nfs_i.ncommit' error Trond Myklebust
@ 2007-04-18  1:29                 ` Trond Myklebust
  2007-04-18  2:58                 ` [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues Andrew Morton
  2007-04-18  8:19                 ` Peter Zijlstra
  5 siblings, 0 replies; 43+ messages in thread
From: Trond Myklebust @ 2007-04-18  1:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Florin Iucha, Andrew Morton, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Protect nfs_set_page_dirty() against races with nfs_inode_add_request.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/write.c |   17 ++++++++++++++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ce5b4a9..7975589 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -388,6 +388,8 @@ static int nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
 	}
 	SetPagePrivate(req->wb_page);
 	set_page_private(req->wb_page, (unsigned long)req);
+	if (PageDirty(req->wb_page))
+		set_bit(PG_NEED_FLUSH, &req->wb_flags);
 	nfsi->npages++;
 	atomic_inc(&req->wb_count);
 	return 0;
@@ -407,6 +409,8 @@ static void nfs_inode_remove_request(struct nfs_page *req)
 	set_page_private(req->wb_page, 0);
 	ClearPagePrivate(req->wb_page);
 	radix_tree_delete(&nfsi->nfs_page_tree, req->wb_index);
+	if (test_and_clear_bit(PG_NEED_FLUSH, &req->wb_flags))
+		__set_page_dirty_nobuffers(req->wb_page);
 	nfsi->npages--;
 	if (!nfsi->npages) {
 		spin_unlock(&nfsi->req_lock);
@@ -1527,15 +1531,22 @@ int nfs_wb_page(struct inode *inode, struct page* page)
 
 int nfs_set_page_dirty(struct page *page)
 {
+	spinlock_t *req_lock = &NFS_I(page->mapping->host)->req_lock;
 	struct nfs_page *req;
+	int ret;
 
-	req = nfs_page_find_request(page);
+	spin_lock(req_lock);
+	req = nfs_page_find_request_locked(page);
 	if (req != NULL) {
 		/* Mark any existing write requests for flushing */
-		set_bit(PG_NEED_FLUSH, &req->wb_flags);
+		ret = !test_and_set_bit(PG_NEED_FLUSH, &req->wb_flags);
+		spin_unlock(req_lock);
 		nfs_release_request(req);
+		return ret;
 	}
-	return __set_page_dirty_nobuffers(page);
+	ret = __set_page_dirty_nobuffers(page);
+	spin_unlock(req_lock);
+	return ret;
 }
 
 

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18  1:19               ` [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues Trond Myklebust
                                   ` (3 preceding siblings ...)
  2007-04-18  1:29                 ` [PATCH 4/4] NFS: Fix race in nfs_set_page_dirty Trond Myklebust
@ 2007-04-18  2:58                 ` Andrew Morton
  2007-04-18  3:06                   ` Trond Myklebust
  2007-04-18  8:19                 ` Peter Zijlstra
  5 siblings, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2007-04-18  2:58 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Peter Zijlstra, Linus Torvalds, Florin Iucha, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

On Tue, 17 Apr 2007 21:19:46 -0400 Trond Myklebust <Trond.Myklebust@netapp.com> wrote:

> 
> I've split the issues introduced by the 2.6.21-rcX write code up into 4
> subproblems.
> 
> The first patch is just a cleanup in order to ease review.
> 
> Patch number 2 ensures that we never release the PG_writeback flag until
> _after_ we've either discarded the unstable request altogether, or put it
> on the nfs_inode's commit or dirty lists.
> 
> Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
> uses the PG_NEED_COMMIT flag as an indicator for whether or not the request
> may be redirtied.
> 
> Patch number 4 protects the NFS '.set_page_dirty' address_space operation
> against races with nfs_inode_add_request.

For 2.6.21, yes?

I diligently tried to review this code but alas, it seems that my NFS
knowledge remains not up to the task.  Please avoid buses.

At least it compiles with all the configs I could think of ;)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18  2:58                 ` [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues Andrew Morton
@ 2007-04-18  3:06                   ` Trond Myklebust
  2007-04-18  3:30                     ` Florin Iucha
  2007-04-18  9:54                     ` OGAWA Hirofumi
  0 siblings, 2 replies; 43+ messages in thread
From: Trond Myklebust @ 2007-04-18  3:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Linus Torvalds, Florin Iucha, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

On Tue, 2007-04-17 at 19:58 -0700, Andrew Morton wrote:
> On Tue, 17 Apr 2007 21:19:46 -0400 Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> 
> > 
> > I've split the issues introduced by the 2.6.21-rcX write code up into 4
> > subproblems.
> > 
> > The first patch is just a cleanup in order to ease review.
> > 
> > Patch number 2 ensures that we never release the PG_writeback flag until
> > _after_ we've either discarded the unstable request altogether, or put it
> > on the nfs_inode's commit or dirty lists.
> > 
> > Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
> > uses the PG_NEED_COMMIT flag as an indicator for whether or not the request
> > may be redirtied.
> > 
> > Patch number 4 protects the NFS '.set_page_dirty' address_space operation
> > against races with nfs_inode_add_request.
> 
> For 2.6.21, yes?

Right. A couple of nasty regressions have been sighted. This series
attempts to deal with them all.

> I diligently tried to review this code but alas, it seems that my NFS
> knowledge remains not up to the task.  Please avoid buses.
> 
> At least it compiles with all the configs I could think of ;)

I was mainly interested in feedback from Peter, Florin and Ogawa-san to
find out if this series fixes their problems. You were unfortunate
enough to have been on earlier Ccs, so I didn't dare trim you off. :-)

Cheers
  Trond

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18  3:06                   ` Trond Myklebust
@ 2007-04-18  3:30                     ` Florin Iucha
  2007-04-18  3:54                       ` Trond Myklebust
  2007-04-18  9:54                     ` OGAWA Hirofumi
  1 sibling, 1 reply; 43+ messages in thread
From: Florin Iucha @ 2007-04-18  3:30 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Andrew Morton, Peter Zijlstra, Linus Torvalds, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1500 bytes --]

On Tue, Apr 17, 2007 at 11:06:05PM -0400, Trond Myklebust wrote:
> > > I've split the issues introduced by the 2.6.21-rcX write code up into 4
> > > subproblems.
> > > 
> > > The first patch is just a cleanup in order to ease review.
> > > 
> > > Patch number 2 ensures that we never release the PG_writeback flag until
> > > _after_ we've either discarded the unstable request altogether, or put it
> > > on the nfs_inode's commit or dirty lists.
> > > 
> > > Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
> > > uses the PG_NEED_COMMIT flag as an indicator for whether or not the request
> > > may be redirtied.
> > > 
> > > Patch number 4 protects the NFS '.set_page_dirty' address_space operation
> > > against races with nfs_inode_add_request.
> > 
> > For 2.6.21, yes?
> 
> Right. A couple of nasty regressions have been sighted. This series
> attempts to deal with them all.

The good news is that the Gnome session log-in progresses to the point
where both top and bottom bars are painted (gray) and the bottom bar
is populated with icons (2.6.21-rc7 vanilla stops after displaying the
splash).  The bad news is that it stops there.

Big-copy fails as well, after 2.5G transferred.

The process traces are at:

   http://iucha.net/nfs/21-rc7-nfs1/gnome-session
   http://iucha.net/nfs/21-rc7-nfs1/big-copy

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18  3:30                     ` Florin Iucha
@ 2007-04-18  3:54                       ` Trond Myklebust
  2007-04-18  4:07                         ` Florin Iucha
  0 siblings, 1 reply; 43+ messages in thread
From: Trond Myklebust @ 2007-04-18  3:54 UTC (permalink / raw)
  To: Florin Iucha
  Cc: Andrew Morton, Peter Zijlstra, Linus Torvalds, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

On Tue, 2007-04-17 at 22:30 -0500, Florin Iucha wrote:
> On Tue, Apr 17, 2007 at 11:06:05PM -0400, Trond Myklebust wrote:
> > > > I've split the issues introduced by the 2.6.21-rcX write code up into 4
> > > > subproblems.
> > > > 
> > > > The first patch is just a cleanup in order to ease review.
> > > > 
> > > > Patch number 2 ensures that we never release the PG_writeback flag until
> > > > _after_ we've either discarded the unstable request altogether, or put it
> > > > on the nfs_inode's commit or dirty lists.
> > > > 
> > > > Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
> > > > uses the PG_NEED_COMMIT flag as an indicator for whether or not the request
> > > > may be redirtied.
> > > > 
> > > > Patch number 4 protects the NFS '.set_page_dirty' address_space operation
> > > > against races with nfs_inode_add_request.
> > > 
> > > For 2.6.21, yes?
> > 
> > Right. A couple of nasty regressions have been sighted. This series
> > attempts to deal with them all.
> 
> The good news is that the Gnome session log-in progresses to the point
> where both top and bottom bars are painted (gray) and the bottom bar
> is populated with icons (2.6.21-rc7 vanilla stops after displaying the
> splash).  The bad news is that it stops there.
> 
> Big-copy fails as well, after 2.5G transferred.
> 
> The process traces are at:
> 
>    http://iucha.net/nfs/21-rc7-nfs1/gnome-session
>    http://iucha.net/nfs/21-rc7-nfs1/big-copy
> 
> Regards,
> florin

Could you tell us a bit more about what happens when these hangs occur?
Does the networking stop too, or just NFS? How about CIFS?

Cheers
  Trond

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18  3:54                       ` Trond Myklebust
@ 2007-04-18  4:07                         ` Florin Iucha
  2007-04-18  4:13                           ` Andrew Morton
  2007-04-18 11:38                           ` Trond Myklebust
  0 siblings, 2 replies; 43+ messages in thread
From: Florin Iucha @ 2007-04-18  4:07 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Andrew Morton, Peter Zijlstra, Linus Torvalds, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1897 bytes --]

On Tue, Apr 17, 2007 at 11:54:45PM -0400, Trond Myklebust wrote:
> > The good news is that the Gnome session log-in progresses to the point
> > where both top and bottom bars are painted (gray) and the bottom bar
> > is populated with icons (2.6.21-rc7 vanilla stops after displaying the
> > splash).  The bad news is that it stops there.
> > 
> > Big-copy fails as well, after 2.5G transferred.
> > 
> > The process traces are at:
> > 
> >    http://iucha.net/nfs/21-rc7-nfs1/gnome-session
> >    http://iucha.net/nfs/21-rc7-nfs1/big-copy
> > 
> > Regards,
> > florin
> 
> Could you tell us a bit more about what happens when these hangs occur?
> Does the networking stop too, or just NFS? How about CIFS?

The networking does not stop, I can ssh into and out of the box
without any problem.

When 'gnome-session' hangs, it does not react to any clicks on the
terminal or browser icons.  If I switch to a virtual console then come
back into X, the panels (or splash) are gray - the icons disappear. I
can switch to a virtual console and give it the three finger salute and
it reboots cleanly.

When 'big-copy' hangs, if I switch to a different console and run
'lsof', '[u]mount', or use shell completion on a network mount then that
process goes into D state.  I cannot umount the network shares nor
stop autofs.  I cannot do a clean reboot, I have to ssh
in and "echo s > /proc/sysrq-trigger; echo u > /proc/sysrq-trigger;
echo b > /proc/sysrq-trigger" .

I am not mounting anything using CIFS, but I could give it a try.

I could transfer 75 GB without hiccup with 2.6.19 using NFS4 and CIFS,
and with 2.6.20 using CIFS.  2.6.20 works fine under reasonably light
load, with gnome sessions logging in and out several times a day.

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18  4:07                         ` Florin Iucha
@ 2007-04-18  4:13                           ` Andrew Morton
  2007-04-18  4:30                             ` Florin Iucha
  2007-04-18 11:38                           ` Trond Myklebust
  1 sibling, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2007-04-18  4:13 UTC (permalink / raw)
  To: Florin Iucha
  Cc: Trond Myklebust, Peter Zijlstra, Linus Torvalds, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

On Tue, 17 Apr 2007 23:07:30 -0500 florin@iucha.net (Florin Iucha) wrote:

> On Tue, Apr 17, 2007 at 11:54:45PM -0400, Trond Myklebust wrote:
> > > The good news is that the Gnome session log-in progresses to the point
> > > where both top and bottom bars are painted (gray) and the bottom bar
> > > is populated with icons (2.6.21-rc7 vanilla stops after displaying the
> > > splash).  The bad news is that it stops there.
> > > 
> > > Big-copy fails as well, after 2.5G transferred.
> > > 
> > > The process traces are at:
> > > 
> > >    http://iucha.net/nfs/21-rc7-nfs1/gnome-session
> > >    http://iucha.net/nfs/21-rc7-nfs1/big-copy
> > > 
> > > Regards,
> > > florin
> > 
> > Could you tell us a bit more about what happens when these hangs occur?
> > Does the networking stop too, or just NFS? How about CIFS?
> 
> The networking does not stop, I can ssh into and out of the box
> without any problem.
> 
> When 'gnome-session' hangs, it does not react to any clicks on the
> terminal or browser icons.  If I switch to a virtual console then come
> back into X, the panels (or splash) are gray - the icons disappear. I
> can switch to a virtual console and give it the three finger salute and
> it reboots cleanly.
> 
> When 'big-copy' hangs, if I switch to a different console and run
> 'lsof', '[u]mount', or use shell completion on a network mount then that
> process goes into D state.  I cannot umount the network shares nor
> stop autofs.  I cannot do a clean reboot, I have to ssh
> in and "echo s > /proc/sysrq-trigger; echo u > /proc/sysrq-trigger;
> echo b > /proc/sysrq-trigger" .

please, do `echo t > /proc/sysrq-trigger' first, send us the result.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18  4:13                           ` Andrew Morton
@ 2007-04-18  4:30                             ` Florin Iucha
  2007-04-18  5:14                               ` Linus Torvalds
  0 siblings, 1 reply; 43+ messages in thread
From: Florin Iucha @ 2007-04-18  4:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Trond Myklebust, Peter Zijlstra, Linus Torvalds, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 589 bytes --]

On Tue, Apr 17, 2007 at 09:13:50PM -0700, Andrew Morton wrote:
> On Tue, 17 Apr 2007 23:07:30 -0500 florin@iucha.net (Florin Iucha) wrote:
> > > > The process traces are at:
> > > > 
> > > >    http://iucha.net/nfs/21-rc7-nfs1/gnome-session
> > > >    http://iucha.net/nfs/21-rc7-nfs1/big-copy
> 
> please, do `echo t > /proc/sysrq-trigger' first, send us the result.

Already did.  Traces from vanilla kernel at
   http://iucha.net/nfs/21-rc7/big-copy

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18  4:30                             ` Florin Iucha
@ 2007-04-18  5:14                               ` Linus Torvalds
  2007-04-18  5:26                                 ` Florin Iucha
  2007-04-18  5:37                                 ` Andrew Morton
  0 siblings, 2 replies; 43+ messages in thread
From: Linus Torvalds @ 2007-04-18  5:14 UTC (permalink / raw)
  To: Florin Iucha
  Cc: Andrew Morton, Trond Myklebust, Peter Zijlstra, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel



On Tue, 17 Apr 2007, Florin Iucha wrote:
> 
> Already did.  Traces from vanilla kernel at
>    http://iucha.net/nfs/21-rc7/big-copy

Well, there's a pdflush in io_schedule_timeout/congestion_wait, and 
there's a nfsv4-scv in svc_recv/nfs_callback_sv, and a lot of processes 
either just in schedule_timeout or similar "normal" waiting (pollwait 
etc).

[ The call traces could be prettier, but sadly, even if you enable frame 
  pointers, the x86-64 kernel is too stupid to follow them. So you kind of 
  just have to ignore the noise) ]

The triggering process looks like it might be that "cp", it is in the 
__wait_on_bit/sync_page/wait_on_page_bit/wait_on_page_writeback_range/ 
filemap_fdatawait.

Is this a trace from the "big copy" hang, or from a gnome splashscreen 
hang? It *looks* like it's a big copy. Yes/no?

Anyway, looks like the cp did a "utimes()" system call, which triggers 
"nfs_setattr(), which in turn triggers the filemap_fdatawait() and some 
kind of endless wait. Nothing stands out from the traces, in other words. 
Doesn't look like a locking thing, for example - we're not stuck on some 
inode semaphore, we're literally waiting for the page to be written out.

		Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18  5:14                               ` Linus Torvalds
@ 2007-04-18  5:26                                 ` Florin Iucha
  2007-04-18  5:37                                 ` Andrew Morton
  1 sibling, 0 replies; 43+ messages in thread
From: Florin Iucha @ 2007-04-18  5:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Trond Myklebust, Peter Zijlstra, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]

On Tue, Apr 17, 2007 at 10:14:02PM -0700, Linus Torvalds wrote:
> On Tue, 17 Apr 2007, Florin Iucha wrote:
> > 
> > Already did.  Traces from vanilla kernel at
> >    http://iucha.net/nfs/21-rc7/big-copy
> 
> Well, there's a pdflush in io_schedule_timeout/congestion_wait, and 
> there's a nfsv4-scv in svc_recv/nfs_callback_sv, and a lot of processes 
> either just in schedule_timeout or similar "normal" waiting (pollwait 
> etc).
> 
> [ The call traces could be prettier, but sadly, even if you enable frame 
>   pointers, the x86-64 kernel is too stupid to follow them. So you kind of 
>   just have to ignore the noise) ]
> 
> The triggering process looks like it might be that "cp", it is in the 
> __wait_on_bit/sync_page/wait_on_page_bit/wait_on_page_writeback_range/ 
> filemap_fdatawait.
> 
> Is this a trace from the "big copy" hang, or from a gnome splashscreen 
> hang? It *looks* like it's a big copy. Yes/no?

It *is* big copy.

I am monitoring the copy on the server, using "iostat 5".  It writes
for a while at a fairly constant pace (on ext3 *), then it drops to 0 in 5-10
seconds and stays there...  Once I left it for a couple of hours and
it did not pick up.

Regards,
florin

(*) XFS exhibited wild ups and downs in the transaction rate, and JFS
starts strong then loses steam and slowly settles to about half ext3's
rate.

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18  5:14                               ` Linus Torvalds
  2007-04-18  5:26                                 ` Florin Iucha
@ 2007-04-18  5:37                                 ` Andrew Morton
  2007-04-18 12:38                                   ` Florin Iucha
  2007-04-29 19:41                                   ` Rogier Wolff
  1 sibling, 2 replies; 43+ messages in thread
From: Andrew Morton @ 2007-04-18  5:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Florin Iucha, Trond Myklebust, Peter Zijlstra, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

On Tue, 17 Apr 2007 22:14:02 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> 
> On Tue, 17 Apr 2007, Florin Iucha wrote:
> > 
> > Already did.  Traces from vanilla kernel at
> >    http://iucha.net/nfs/21-rc7/big-copy
> 
> Well, there's a pdflush in io_schedule_timeout/congestion_wait, and 
> there's a nfsv4-scv in svc_recv/nfs_callback_sv, and a lot of processes 
> either just in schedule_timeout or similar "normal" waiting (pollwait 
> etc).
> 
> [ The call traces could be prettier, but sadly, even if you enable frame 
>   pointers, the x86-64 kernel is too stupid to follow them. So you kind of 
>   just have to ignore the noise) ]
> 
> The triggering process looks like it might be that "cp", it is in the 
> __wait_on_bit/sync_page/wait_on_page_bit/wait_on_page_writeback_range/ 
> filemap_fdatawait.
> 
> Is this a trace from the "big copy" hang, or from a gnome splashscreen 
> hang? It *looks* like it's a big copy. Yes/no?
> 
> Anyway, looks like the cp did a "utimes()" system call, which triggers 
> "nfs_setattr(), which in turn triggers the filemap_fdatawait() and some 
> kind of endless wait. Nothing stands out from the traces, in other words. 
> Doesn't look like a locking thing, for example - we're not stuck on some 
> inode semaphore, we're literally waiting for the page to be written out.
> 

That's the unpatched kernel.  I think the trace with Trond's patchset
is http://iucha.net/nfs/21-rc7-nfs1/big-copy

[  243.594565] cp            S 00000038b55df454     0  2547   2526 (NOTLB)
[  243.594570]  ffff810078339988 0000000000000086 0000000000000000 00000001000087a9
[  243.594574]  ffff810078339908 ffffffff80162c03 000000000000000a ffff81007f612fe0
[  243.594579]  ffffffff8054c3e0 000000000000063f ffff81007f6131b8 0000000000000292
[  243.594583] Call Trace:
[  243.594587]  [<ffffffff80162c03>] _spin_lock_irqsave+0x11/0x18
[  243.594591]  [<ffffffff8011c0c5>] __mod_timer+0xa9/0xbb
[  243.594595]  [<ffffffff80161283>] schedule_timeout+0x8d/0xb4
[  243.594599]  [<ffffffff8018b8ac>] process_timeout+0x0/0xb
[  243.594603]  [<ffffffff80160bcf>] io_schedule_timeout+0x28/0x33
[  243.594607]  [<ffffffff801a9400>] congestion_wait_interruptible+0x86/0xa3
[  243.594611]  [<ffffffff8019489f>] autoremove_wake_function+0x0/0x38
[  243.594615]  [<ffffffff80418952>] rpc_save_sigmask+0x2f/0x31
[  243.594619]  [<ffffffff80236cfa>] nfs_writepage_setup+0xc8/0x482
[  243.594624]  [<ffffffff80237507>] nfs_updatepage+0x101/0x143
[  243.594628]  [<ffffffff8022da30>] nfs_commit_write+0x2e/0x41
[  243.594633]  [<ffffffff8010fb96>] generic_file_buffered_write+0x530/0x773
[  243.594639]  [<ffffffff8015fae7>] copy_user_generic_string+0x17/0x40
[  243.594643]  [<ffffffff8010cb51>] file_read_actor+0xaa/0x130
[  243.594648]  [<ffffffff80115ee8>] __generic_file_aio_write_nolock+0x38b/0x3fe
[  243.594652]  [<ffffffff8013824c>] debug_mutex_free_waiter+0x5b/0x5f
[  243.594657]  [<ffffffff80120d73>] generic_file_aio_write+0x64/0xc0
[  243.594662]  [<ffffffff8022e128>] nfs_file_write+0xee/0x15a
[  243.594666]  [<ffffffff801175c8>] do_sync_write+0xe2/0x126
[  243.594671]  [<ffffffff8019489f>] autoremove_wake_function+0x0/0x38
[  243.594676]  [<ffffffff80162ab5>] _spin_unlock+0x9/0xb
[  243.594680]  [<ffffffff80116294>] vfs_write+0xae/0x137
[  243.594684]  [<ffffffff80116bea>] sys_write+0x47/0x70
[  243.594688]  [<ffffffff8015ad5e>] system_call+0x7e/0x83

At a guess, bdi_write_congested() is failing to return false.  (nfs_update_request()
got inlined in nfs_writepage_setup(), even though it was defined afterwards?  gcc
got smarter)

We have a stuck pdflush, presumably waiting for dirty+writeback memory to subside:

[  243.594761] pdflush       D 00000038b5643b16     0  2552     11 (L-TLB)
[  243.594766]  ffff81000c617d70 0000000000000046 0000000000000000 00000001000087a9
[  243.594770]  ffff81000c617cf0 ffffffff80162c03 000000000000000a ffff81007e5ff510
[  243.594775]  ffff810002f4a080 0000000000000ae5 ffff81007e5ff6e8 0000000100000282
[  243.594779] Call Trace:
[  243.594783]  [<ffffffff80162c03>] _spin_lock_irqsave+0x11/0x18
[  243.594787]  [<ffffffff8011c0c5>] __mod_timer+0xa9/0xbb
[  243.594792]  [<ffffffff801946f2>] keventd_create_kthread+0x0/0x79
[  243.594795]  [<ffffffff80161283>] schedule_timeout+0x8d/0xb4
[  243.594799]  [<ffffffff8018b8ac>] process_timeout+0x0/0xb
[  243.594803]  [<ffffffff80160bcf>] io_schedule_timeout+0x28/0x33
[  243.594806]  [<ffffffff801a935e>] congestion_wait+0x6b/0x87
[  243.594810]  [<ffffffff8019489f>] autoremove_wake_function+0x0/0x38
[  243.594814]  [<ffffffff8014da11>] writeback_inodes+0xe1/0xea
[  243.594818]  [<ffffffff80153381>] pdflush+0x0/0x1e3
[  243.594821]  [<ffffffff801a732c>] wb_kupdate+0xbb/0x113
[  243.594825]  [<ffffffff80153381>] pdflush+0x0/0x1e3
[  243.594828]  [<ffffffff801534b9>] pdflush+0x138/0x1e3
[  243.594831]  [<ffffffff801a7271>] wb_kupdate+0x0/0x113
[  243.594835]  [<ffffffff801315d6>] kthread+0xd8/0x10b
[  243.594839]  [<ffffffff801270d2>] schedule_tail+0x45/0xa5
[  243.594843]  [<ffffffff8015bb78>] child_rip+0xa/0x12
[  243.594847]  [<ffffffff801946f2>] keventd_create_kthread+0x0/0x79
[  243.594850]  [<ffffffff801471f5>] worker_thread+0x0/0x14b
[  243.594854]  [<ffffffff801314fe>] kthread+0x0/0x10b
[  243.594858]  [<ffffffff8015bb6e>] child_rip+0x0/0x12

at a guess I'd say we have a ton of memory under writeback, but the completions
aren't coming back.

Florin, can we please see /proc/meminfo as well?

Also the result of `echo m > /proc/sysrq-trigger'

Thanks.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18  1:19               ` [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues Trond Myklebust
                                   ` (4 preceding siblings ...)
  2007-04-18  2:58                 ` [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues Andrew Morton
@ 2007-04-18  8:19                 ` Peter Zijlstra
  2007-04-18 16:41                   ` Peter Zijlstra
  5 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2007-04-18  8:19 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Linus Torvalds, Florin Iucha, Andrew Morton, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

On Tue, 2007-04-17 at 21:19 -0400, Trond Myklebust wrote:
> I've split the issues introduced by the 2.6.21-rcX write code up into 4
> subproblems.
> 
> The first patch is just a cleanup in order to ease review.
> 
> Patch number 2 ensures that we never release the PG_writeback flag until
> _after_ we've either discarded the unstable request altogether, or put it
> on the nfs_inode's commit or dirty lists.
> 
> Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
> uses the PG_NEED_COMMIT flag as an indicator for whether or not the request
> may be redirtied.
> 
> Patch number 4 protects the NFS '.set_page_dirty' address_space operation
> against races with nfs_inode_add_request.

Ok, stuck them in, and my debug patch from yesterday, just in case...

However, I can't seem to run long enough to establish whether the
problem is gone. It deadlocks between 10-30 minutes due to missing IO
completions, whereas yesterday it took between 45-60 minutes to trigger
the 'desynchronized value of nfs_i.ncommit' messages.

I will continue trying go get a good run, however if you got some
(perhaps experimental .22) patches you want me to try..




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18  3:06                   ` Trond Myklebust
  2007-04-18  3:30                     ` Florin Iucha
@ 2007-04-18  9:54                     ` OGAWA Hirofumi
  1 sibling, 0 replies; 43+ messages in thread
From: OGAWA Hirofumi @ 2007-04-18  9:54 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Andrew Morton, Peter Zijlstra, Linus Torvalds, Florin Iucha,
	Adrian Bunk, linux-kernel

Trond Myklebust <Trond.Myklebust@netapp.com> writes:

> I was mainly interested in feedback from Peter, Florin and Ogawa-san to
> find out if this series fixes their problems. You were unfortunate
> enough to have been on earlier Ccs, so I didn't dare trim you off. :-)

Sorry. I'm trying to reproduce that, but unfortunately I can't
reproduce that on unpatched kernel for now.

FWIW, syslog is here.

Apr 17 03:54:55 duaron kernel: [drm] Initialized i915 1.6.0 20060119 on minor 0
Apr 17 03:58:31 duaron kernel: tun: Universal TUN/TAP device driver, 1.6
Apr 17 03:58:31 duaron kernel: tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
Apr 17 03:58:44 duaron kernel: tun0: no IPv6 routers present
Apr 17 13:33:13 duaron kernel: NFS: desynchronized value of nfs_i.ncommit.
Apr 17 13:33:47 duaron last message repeated 16 times
Apr 17 13:34:49 duaron last message repeated 32 times
Apr 17 13:35:19 duaron last message repeated 7 times
Apr 17 13:39:39 duaron kernel: NFS: desynchronized value of nfs_i.ncommit.
Apr 17 13:40:39 duaron last message repeated 3 times
Apr 17 13:42:26 duaron kernel: NFS: desynchronized value of nfs_i.ncommit.
Apr 17 13:42:30 duaron last message repeated 3 times
Apr 18 00:58:22 duaron kernel: NFS: desynchronized value of nfs_i.ncommit.
Apr 18 00:59:09 duaron last message repeated 129 times
Apr 18 01:00:31 duaron last message repeated 24 times
Apr 18 01:02:16 duaron kernel: NFS: desynchronized value of nfs_i.ncommit.
Apr 18 01:03:25 duaron last message repeated 47 times
Apr 18 01:04:30 duaron last message repeated 16 times
Apr 18 01:05:31 duaron last message repeated 28 times
Apr 18 01:06:33 duaron last message repeated 32 times
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18  4:07                         ` Florin Iucha
  2007-04-18  4:13                           ` Andrew Morton
@ 2007-04-18 11:38                           ` Trond Myklebust
  1 sibling, 0 replies; 43+ messages in thread
From: Trond Myklebust @ 2007-04-18 11:38 UTC (permalink / raw)
  To: Florin Iucha
  Cc: Andrew Morton, Peter Zijlstra, Linus Torvalds, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1012 bytes --]

On Tue, 2007-04-17 at 23:07 -0500, Florin Iucha wrote:
> When 'big-copy' hangs, if I switch to a different console and run
> 'lsof', '[u]mount', or use shell completion on a network mount then that
> process goes into D state.  I cannot umount the network shares nor
> stop autofs.  I cannot do a clean reboot, I have to ssh
> in and "echo s > /proc/sysrq-trigger; echo u > /proc/sysrq-trigger;
> echo b > /proc/sysrq-trigger" .

What happens if you issue "echo 0 >/proc/sys/sunrpc/rpc_debug"?

> I am not mounting anything using CIFS, but I could give it a try.
> 
> I could transfer 75 GB without hiccup with 2.6.19 using NFS4 and CIFS,
> and with 2.6.20 using CIFS.  2.6.20 works fine under reasonably light
> load, with gnome sessions logging in and out several times a day.

How about NFSv3? I'd like to eliminate any issues with NFSv4 state.

I've also attached a little patch that I used in order to debug the list
consistency issues. Could you try it on top of the 4 I sent last night?

Cheers
  Trond



[-- Attachment #2: linux-2.6.21-031-debugging_do_not_merge.dif --]
[-- Type: message/rfc822, Size: 3135 bytes --]

From: Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: No Subject
Date: Sun, 15 Apr 2007 19:02:47 -0400
Message-ID: <1176896287.6796.48.camel@heimdal.trondhjem.org>

Adds consistency checks for nfs_page list operations

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/write.c           |    8 ++++++--
 include/linux/nfs_page.h |    3 +++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index cadbf3c..9be626d 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -194,6 +194,7 @@ static int nfs_writepage_setup(struct nfs_open_context *ctx, struct page *page,
 	nfs_grow_file(page, offset, count);
 	/* Set the PG_uptodate flag? */
 	nfs_mark_uptodate(page, offset, count);
+	WARN_ON(test_bit(PG_NEED_COMMIT,&(req)->wb_flags));
 	nfs_unlock_request(req);
 	return 0;
 }
@@ -459,6 +460,7 @@ nfs_mark_request_commit(struct nfs_page *req)
 	struct inode *inode = req->wb_context->dentry->d_inode;
 	struct nfs_inode *nfsi = NFS_I(inode);
 
+	WARN_ON(nfs_dirty_request(req));
 	spin_lock(&nfsi->req_lock);
 	nfs_list_add_request(req, &nfsi->commit);
 	nfsi->ncommit++;
@@ -552,7 +554,7 @@ static void nfs_cancel_commit_list(struct list_head *head)
 		req = nfs_list_entry(head->next);
 		dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
 		nfs_list_remove_request(req);
-		clear_bit(PG_NEED_COMMIT, &(req)->wb_flags);
+		WARN_ON(!test_and_clear_bit(PG_NEED_COMMIT,&(req)->wb_flags));
 		nfs_inode_remove_request(req);
 		nfs_unlock_request(req);
 	}
@@ -1033,6 +1035,7 @@ static void nfs_writeback_done_full(struct rpc_task *task, void *calldata)
 
 		if (nfs_write_need_commit(data)) {
 			memcpy(&req->wb_verf, &data->verf, sizeof(req->wb_verf));
+			set_bit(PG_NEED_COMMIT,&(req)->wb_flags);
 			nfs_mark_request_commit(req);
 			nfs_end_page_writeback(page);
 			dprintk(" marked for commit\n");
@@ -1206,6 +1209,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how)
 		nfs_list_remove_request(req);
 		nfs_mark_request_commit(req);
 		dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
+		WARN_ON(!test_and_clear_bit(PG_NEED_COMMIT,&(req)->wb_flags));
 		nfs_clear_page_writeback(req);
 	}
 	return -ENOMEM;
@@ -1229,7 +1233,7 @@ static void nfs_commit_done(struct rpc_task *task, void *calldata)
 	while (!list_empty(&data->pages)) {
 		req = nfs_list_entry(data->pages.next);
 		nfs_list_remove_request(req);
-		clear_bit(PG_NEED_COMMIT, &(req)->wb_flags);
+		WARN_ON(!test_and_clear_bit(PG_NEED_COMMIT,&(req)->wb_flags));
 		dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
 
 		dprintk("NFS: commit (%s/%Ld %d@%Ld)",
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 41afab6..75c2d34 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -116,6 +116,9 @@ nfs_lock_request(struct nfs_page *req)
 static inline void
 nfs_list_add_request(struct nfs_page *req, struct list_head *head)
 {
+	BUG_ON(!list_empty(&req->wb_list));
+	BUG_ON(req->wb_list_head != NULL);
+
 	list_add_tail(&req->wb_list, head);
 	req->wb_list_head = head;
 }

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18  5:37                                 ` Andrew Morton
@ 2007-04-18 12:38                                   ` Florin Iucha
  2007-04-18 13:15                                     ` Trond Myklebust
  2007-04-29 19:41                                   ` Rogier Wolff
  1 sibling, 1 reply; 43+ messages in thread
From: Florin Iucha @ 2007-04-18 12:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Trond Myklebust, Peter Zijlstra, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 655 bytes --]

On Tue, Apr 17, 2007 at 10:37:38PM -0700, Andrew Morton wrote:
> Florin, can we please see /proc/meminfo as well?

   http://iucha.net/nfs/21-rc7-nfs2/meminfo

> Also the result of `echo m > /proc/sysrq-trigger'

   http://iucha.net/nfs/21-rc7-nfs2/big-copy

This has 'echo m > /proc/sysrq-trigger', 'echo t >
/proc/sysrq-trigger' and 'echo 0 > /proc/sys/sunrpc/rpc_debug'.

The output from the server's 'iostat 5' is at

   http://iucha.net/nfs/21-rc7-nfs2/iostat

This run it copied 5.6G (vs yesterday's 2.5G).

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18 12:38                                   ` Florin Iucha
@ 2007-04-18 13:15                                     ` Trond Myklebust
  2007-04-18 13:42                                       ` Florin Iucha
  0 siblings, 1 reply; 43+ messages in thread
From: Trond Myklebust @ 2007-04-18 13:15 UTC (permalink / raw)
  To: Florin Iucha
  Cc: Andrew Morton, Linus Torvalds, Peter Zijlstra, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

On Wed, 2007-04-18 at 07:38 -0500, Florin Iucha wrote:
> On Tue, Apr 17, 2007 at 10:37:38PM -0700, Andrew Morton wrote:
> > Florin, can we please see /proc/meminfo as well?
> 
>    http://iucha.net/nfs/21-rc7-nfs2/meminfo
> 
> > Also the result of `echo m > /proc/sysrq-trigger'
> 
>    http://iucha.net/nfs/21-rc7-nfs2/big-copy
> 
> This has 'echo m > /proc/sysrq-trigger', 'echo t >
> /proc/sysrq-trigger' and 'echo 0 > /proc/sys/sunrpc/rpc_debug'.

Thanks.

So it looks as if you have a massive backlog of requests waiting in the
RPC layer to get sent. That would indeed trigger the BDI congestion
control stuff, and prevent you from sending more requests. The
interesting bit is this:

[  399.665314] -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
[  399.665338] 40373 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0de0        0 xprt_resend ffffffff804196bf ffffffff80440b10
[  399.665345] 40391 0001 0001    -11 ffff81007f418508 100003 ffff810078eb05c8        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665351] 40392 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0128        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665358] 40393 0001 0001    -11 ffff81007f418508 100003 ffff810078eb1158        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665364] 40394 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0f08        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665371] 40395 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0000        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665377] 40396 0001 0001    -11 ffff81007f418508 100003 ffff810078eb1030        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665384] 40397 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0cb8        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665390] 40398 0001 0001    -11 ffff81007f418508 100003 ffff810078eb06f0        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665397] 40399 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0940        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665404] 40400 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0818        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665410] 40401 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0378        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.665417] 40402 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0250        0 xprt_sending ffffffff804196bf ffffffff80440b10
[  399.669252] 41086 0001 0001      0 ffff81007f418508 100003 ffff810078eb0a68    15000 xprt_pending ffffffff804196bf ffffffff80440b10
[  399.669258] 41087 0001 0001    -11 ffff81007f418508 100003 ffff810078eb0b90        0 xprt_resend ffffffff804196bf ffffffff80440b10
[  399.669265] 41088 0001 0001    -11 ffff81007f418508 100003 ffff810078eb04a0        0 xprt_sending ffffffff804196bf ffffffff80440b10

There is only one request on the 'pending' queue. That would usually
indicate that the connection to the server is down. Can you check using
"netstat -t" whether or not there is a connection in the 'ESTABLISHED'
state to the server? Please also repeat the command a couple of times in
order to see if the socket/port number on the connection changes.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18 13:15                                     ` Trond Myklebust
@ 2007-04-18 13:42                                       ` Florin Iucha
  2007-04-18 14:11                                         ` Trond Myklebust
  2007-04-18 14:14                                         ` Florin Iucha
  0 siblings, 2 replies; 43+ messages in thread
From: Florin Iucha @ 2007-04-18 13:42 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Andrew Morton, Linus Torvalds, Peter Zijlstra, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1673 bytes --]

On Wed, Apr 18, 2007 at 09:15:31AM -0400, Trond Myklebust wrote:
> There is only one request on the 'pending' queue. That would usually
> indicate that the connection to the server is down. Can you check using
> "netstat -t" whether or not there is a connection in the 'ESTABLISHED'
> state to the server? Please also repeat the command a couple of times in
> order to see if the socket/port number on the connection changes.

This is with your fifth patch on top of the previous four patches:

   http://iucha.net/nfs/21-rc7-nfs3/big-copy

Again, it has memory, stack traces and rpc_debug.

The iostat 5 output:

   http://iucha.net/nfs/21-rc7-nfs3/iostat

The netstat outputs are stable (not changed in 5 minutes):

   http://iucha.net/nfs/21-rc7-nfs3/netstat-server :

tcp        1      0 hermes.iucha.org:nfs    zeus.iucha.org:799      CLOSE_WAIT 
tcp        0      0 hermes.iucha.org:nfs    zeus.iucha.org:976      ESTABLISHED

   http://iucha.net/nfs/21-rc7-nfs3/netstat-client

Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 zeus.iucha.org:976      hermes.iucha.org:nfs    ESTABLISHED
tcp        0      0 zeus.iucha.org:ssh      hermes.iucha.org:56880  ESTABLISHED
tcp        0      0 zeus.iucha.org:ssh      hermes.iucha.org:45176  ESTABLISHED

Could the port in CLOSE_WAIT state be the culprit?  (FWIW
the server has been up for 38 days and subjected to
this nfs test quite a bit without showing any stress).

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18 13:42                                       ` Florin Iucha
@ 2007-04-18 14:11                                         ` Trond Myklebust
  2007-04-18 14:17                                           ` Florin Iucha
  2007-04-19  1:52                                           ` Florin Iucha
  2007-04-18 14:14                                         ` Florin Iucha
  1 sibling, 2 replies; 43+ messages in thread
From: Trond Myklebust @ 2007-04-18 14:11 UTC (permalink / raw)
  To: Florin Iucha
  Cc: Andrew Morton, Linus Torvalds, Peter Zijlstra, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

On Wed, 2007-04-18 at 08:42 -0500, Florin Iucha wrote:
> On Wed, Apr 18, 2007 at 09:15:31AM -0400, Trond Myklebust wrote:
> > There is only one request on the 'pending' queue. That would usually
> > indicate that the connection to the server is down. Can you check using
> > "netstat -t" whether or not there is a connection in the 'ESTABLISHED'
> > state to the server? Please also repeat the command a couple of times in
> > order to see if the socket/port number on the connection changes.
> 
> This is with your fifth patch on top of the previous four patches:
> 
>    http://iucha.net/nfs/21-rc7-nfs3/big-copy
> 
> Again, it has memory, stack traces and rpc_debug.
> 
> The iostat 5 output:
> 
>    http://iucha.net/nfs/21-rc7-nfs3/iostat
> 
> The netstat outputs are stable (not changed in 5 minutes):
> 
>    http://iucha.net/nfs/21-rc7-nfs3/netstat-server :
> 
> tcp        1      0 hermes.iucha.org:nfs    zeus.iucha.org:799      CLOSE_WAIT 
> tcp        0      0 hermes.iucha.org:nfs    zeus.iucha.org:976      ESTABLISHED
> 
>    http://iucha.net/nfs/21-rc7-nfs3/netstat-client
> 
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address           Foreign Address         State      
> tcp        0      0 zeus.iucha.org:976      hermes.iucha.org:nfs    ESTABLISHED
> tcp        0      0 zeus.iucha.org:ssh      hermes.iucha.org:56880  ESTABLISHED
> tcp        0      0 zeus.iucha.org:ssh      hermes.iucha.org:45176  ESTABLISHED
> 
> Could the port in CLOSE_WAIT state be the culprit?  (FWIW
> the server has been up for 38 days and subjected to
> this nfs test quite a bit without showing any stress).

The port in CLOSE_WAIT shows that a socket was closed down recently, but
once the connection is re-established, the client should start sending
data.
Do you have a copy of wireshark or ethereal on hand? If so, could you
take a look at whether or not any NFS traffic is going between the
client and server once the hang happens?
Note that the timeout value is 60 seconds, so if you see no immediate
traffic, then let the ethereal/wireshark session keep running for a
couple more minutes.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18 13:42                                       ` Florin Iucha
  2007-04-18 14:11                                         ` Trond Myklebust
@ 2007-04-18 14:14                                         ` Florin Iucha
  1 sibling, 0 replies; 43+ messages in thread
From: Florin Iucha @ 2007-04-18 14:14 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Andrew Morton, Linus Torvalds, Peter Zijlstra, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1260 bytes --]

On Wed, Apr 18, 2007 at 08:42:25AM -0500, Florin Iucha wrote:
> On Wed, Apr 18, 2007 at 09:15:31AM -0400, Trond Myklebust wrote:
> The netstat outputs are stable (not changed in 5 minutes):
> 
>    http://iucha.net/nfs/21-rc7-nfs3/netstat-server :
> 
> tcp        1      0 hermes.iucha.org:nfs    zeus.iucha.org:799      CLOSE_WAIT 
> tcp        0      0 hermes.iucha.org:nfs    zeus.iucha.org:976      ESTABLISHED
> 
>    http://iucha.net/nfs/21-rc7-nfs3/netstat-client
> 
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address           Foreign Address         State      
> tcp        0      0 zeus.iucha.org:976      hermes.iucha.org:nfs    ESTABLISHED
> tcp        0      0 zeus.iucha.org:ssh      hermes.iucha.org:56880  ESTABLISHED
> tcp        0      0 zeus.iucha.org:ssh      hermes.iucha.org:45176  ESTABLISHED
> 
> Could the port in CLOSE_WAIT state be the culprit?  (FWIW
> the server has been up for 38 days and subjected to
> this nfs test quite a bit without showing any stress).

The CLOSE_WAIT went away as soon as I rebooted the client.  Something
was holding it up...

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18 14:11                                         ` Trond Myklebust
@ 2007-04-18 14:17                                           ` Florin Iucha
  2007-04-18 14:19                                             ` Trond Myklebust
  2007-04-19  1:52                                           ` Florin Iucha
  1 sibling, 1 reply; 43+ messages in thread
From: Florin Iucha @ 2007-04-18 14:17 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1074 bytes --]

On Wed, Apr 18, 2007 at 10:11:46AM -0400, Trond Myklebust wrote:
> On Wed, 2007-04-18 at 08:42 -0500, Florin Iucha wrote:
> > Could the port in CLOSE_WAIT state be the culprit?  (FWIW
> > the server has been up for 38 days and subjected to
> > this nfs test quite a bit without showing any stress).
> 
> The port in CLOSE_WAIT shows that a socket was closed down recently, but
> once the connection is re-established, the client should start sending
> data.
> Do you have a copy of wireshark or ethereal on hand? If so, could you
> take a look at whether or not any NFS traffic is going between the
> client and server once the hang happens?
> Note that the timeout value is 60 seconds, so if you see no immediate
> traffic, then let the ethereal/wireshark session keep running for a
> couple more minutes.

Should I run wireshark/ethereal on the client or on the server?

I'll get a trace tonight (10 PM CST) and get back to you.

Thanks,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18 14:17                                           ` Florin Iucha
@ 2007-04-18 14:19                                             ` Trond Myklebust
  0 siblings, 0 replies; 43+ messages in thread
From: Trond Myklebust @ 2007-04-18 14:19 UTC (permalink / raw)
  To: Florin Iucha; +Cc: linux-kernel

On Wed, 2007-04-18 at 09:17 -0500, Florin Iucha wrote:
> On Wed, Apr 18, 2007 at 10:11:46AM -0400, Trond Myklebust wrote:
> > On Wed, 2007-04-18 at 08:42 -0500, Florin Iucha wrote:
> > > Could the port in CLOSE_WAIT state be the culprit?  (FWIW
> > > the server has been up for 38 days and subjected to
> > > this nfs test quite a bit without showing any stress).
> > 
> > The port in CLOSE_WAIT shows that a socket was closed down recently, but
> > once the connection is re-established, the client should start sending
> > data.
> > Do you have a copy of wireshark or ethereal on hand? If so, could you
> > take a look at whether or not any NFS traffic is going between the
> > client and server once the hang happens?
> > Note that the timeout value is 60 seconds, so if you see no immediate
> > traffic, then let the ethereal/wireshark session keep running for a
> > couple more minutes.
> 
> Should I run wireshark/ethereal on the client or on the server?

On the client, please, for the moment.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18  8:19                 ` Peter Zijlstra
@ 2007-04-18 16:41                   ` Peter Zijlstra
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2007-04-18 16:41 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Linus Torvalds, Florin Iucha, Andrew Morton, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

On Wed, 2007-04-18 at 10:19 +0200, Peter Zijlstra wrote:
> On Tue, 2007-04-17 at 21:19 -0400, Trond Myklebust wrote:
> > I've split the issues introduced by the 2.6.21-rcX write code up into 4
> > subproblems.
> > 
> > The first patch is just a cleanup in order to ease review.
> > 
> > Patch number 2 ensures that we never release the PG_writeback flag until
> > _after_ we've either discarded the unstable request altogether, or put it
> > on the nfs_inode's commit or dirty lists.
> > 
> > Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
> > uses the PG_NEED_COMMIT flag as an indicator for whether or not the request
> > may be redirtied.
> > 
> > Patch number 4 protects the NFS '.set_page_dirty' address_space operation
> > against races with nfs_inode_add_request.
> 
> Ok, stuck them in, and my debug patch from yesterday, just in case...
> 
> However, I can't seem to run long enough to establish whether the
> problem is gone. It deadlocks between 10-30 minutes due to missing IO
> completions, whereas yesterday it took between 45-60 minutes to trigger
> the 'desynchronized value of nfs_i.ncommit' messages.
> 
> I will continue trying go get a good run,

Just got one around 80-90 minutes, no 'desynchronized value of
nfs_i.ncommit' errors.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18 14:11                                         ` Trond Myklebust
  2007-04-18 14:17                                           ` Florin Iucha
@ 2007-04-19  1:52                                           ` Florin Iucha
  2007-04-19  2:45                                             ` Trond Myklebust
  1 sibling, 1 reply; 43+ messages in thread
From: Florin Iucha @ 2007-04-19  1:52 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Andrew Morton, Linus Torvalds, Peter Zijlstra, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 999 bytes --]

On Wed, Apr 18, 2007 at 10:11:46AM -0400, Trond Myklebust wrote:
> Do you have a copy of wireshark or ethereal on hand? If so, could you
> take a look at whether or not any NFS traffic is going between the
> client and server once the hang happens?

I used the following command 

   tcpdump -w nfs-traffic -i eth0 -vv -tt dst port nfs

to capture

   http://iucha.net/nfs/21-rc7-nfs4/nfs-traffic.bz2

I started the capture before starting the copy and left it to run for
a few minutes after the traffic slowed to a crawl.

The iostat and vmstat are at:

   http://iucha.net/nfs/21-rc7-nfs4/iostat
   http://iucha.net/nfs/21-rc7-nfs4/vmstat
   
It seems that my original problem report had a big mistake!  There is
no hang, but at some point the write slows down to a trickle (from
40,000 blocks/s to 22 blocks/s) as can be seen from the iostat log.

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-19  1:52                                           ` Florin Iucha
@ 2007-04-19  2:45                                             ` Trond Myklebust
  2007-04-19  4:38                                               ` Success! Was: " Florin Iucha
  2007-04-19 15:12                                               ` Chuck Lever
  0 siblings, 2 replies; 43+ messages in thread
From: Trond Myklebust @ 2007-04-19  2:45 UTC (permalink / raw)
  To: Florin Iucha, Mr. Charles Edward Lever
  Cc: Andrew Morton, Linus Torvalds, Peter Zijlstra, Adrian Bunk,
	OGAWA Hirofumi, linux-kernel

On Wed, 2007-04-18 at 20:52 -0500, Florin Iucha wrote:
> On Wed, Apr 18, 2007 at 10:11:46AM -0400, Trond Myklebust wrote:
> > Do you have a copy of wireshark or ethereal on hand? If so, could you
> > take a look at whether or not any NFS traffic is going between the
> > client and server once the hang happens?
> 
> I used the following command 
> 
>    tcpdump -w nfs-traffic -i eth0 -vv -tt dst port nfs
> 
> to capture
> 
>    http://iucha.net/nfs/21-rc7-nfs4/nfs-traffic.bz2
> 
> I started the capture before starting the copy and left it to run for
> a few minutes after the traffic slowed to a crawl.
> 
> The iostat and vmstat are at:
> 
>    http://iucha.net/nfs/21-rc7-nfs4/iostat
>    http://iucha.net/nfs/21-rc7-nfs4/vmstat
>    
> It seems that my original problem report had a big mistake!  There is
> no hang, but at some point the write slows down to a trickle (from
> 40,000 blocks/s to 22 blocks/s) as can be seen from the iostat log.

Yeah. You only captured the outgoing traffic to the server, but already
it looks as if there were 'interesting' things going on. In frames 29346
to 29350, the traffic stops altogether for 5 seconds (I only see
keepalives) then it starts up again. Ditto for frames 40477-40482
(another 5 seconds). ...
Then at around frame 92072, the client starts to send a bunch of RSTs.
Aha.... I'll bet that reverting the appended patch fixes the problem.

The assumption Chuck makes is that if _no_ request bytes have been sent,
yet the request is on the 'receive list' then it must be a resend is
patently false in the case where the send queue just happens to be full.
A better solution would probably be to disconnect the socket following
the ETIMEDOUT handling in call_status().

Cheers
  Trond
-------------------------------------------
commit 43d78ef2ba5bec26d0315859e8324bfc0be23766
Author: Chuck Lever <chuck.lever@oracle.com>
Date:   Tue Feb 6 18:26:11 2007 -0500

    NFS: disconnect before retrying NFSv4 requests over TCP
    
    RFC3530 section 3.1.1 states an NFSv4 client MUST NOT send a request
    twice on the same connection unless it is the NULL procedure.  Section
    3.1.1 suggests that the client should disconnect and reconnect if it
    wants to retry a request.
    
    Implement this by adding an rpc_clnt flag that an ULP can use to
    specify that the underlying transport should be disconnected on a
    major timeout.  The NFSv4 client asserts this new flag, and requests
    no retries after a minor retransmit timeout.
    
    Note that disconnecting on a retransmit is in general not safe to do
    if the RPC client does not reuse the TCP port number when reconnecting.
    
    See http://bugzilla.linux-nfs.org/show_bug.cgi?id=6
    
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index a3191f0..c46e94f 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -394,7 +394,8 @@ static void nfs_init_timeout_values(struct rpc_timeout *to, int proto,
 static int nfs_create_rpc_client(struct nfs_client *clp, int proto,
 						unsigned int timeo,
 						unsigned int retrans,
-						rpc_authflavor_t flavor)
+						rpc_authflavor_t flavor,
+						int flags)
 {
 	struct rpc_timeout	timeparms;
 	struct rpc_clnt		*clnt = NULL;
@@ -407,6 +408,7 @@ static int nfs_create_rpc_client(struct nfs_client *clp, int proto,
 		.program	= &nfs_program,
 		.version	= clp->rpc_ops->version,
 		.authflavor	= flavor,
+		.flags		= flags,
 	};
 
 	if (!IS_ERR(clp->cl_rpcclient))
@@ -548,7 +550,7 @@ static int nfs_init_client(struct nfs_client *clp, const struct nfs_mount_data *
 	 * - RFC 2623, sec 2.3.2
 	 */
 	error = nfs_create_rpc_client(clp, proto, data->timeo, data->retrans,
-			RPC_AUTH_UNIX);
+					RPC_AUTH_UNIX, 0);
 	if (error < 0)
 		goto error;
 	nfs_mark_client_ready(clp, NFS_CS_READY);
@@ -868,7 +870,8 @@ static int nfs4_init_client(struct nfs_client *clp,
 	/* Check NFS protocol revision and initialize RPC op vector */
 	clp->rpc_ops = &nfs_v4_clientops;
 
-	error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour);
+	error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour,
+					RPC_CLNT_CREATE_DISCRTRY);
 	if (error < 0)
 		goto error;
 	memcpy(clp->cl_ipaddr, ip_addr, sizeof(clp->cl_ipaddr));
diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index a1be89d..c7a78ee 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -40,6 +40,7 @@ struct rpc_clnt {
 
 	unsigned int		cl_softrtry : 1,/* soft timeouts */
 				cl_intr     : 1,/* interruptible */
+				cl_discrtry : 1,/* disconnect before retry */
 				cl_autobind : 1,/* use getport() */
 				cl_oneshot  : 1,/* dispose after use */
 				cl_dead     : 1;/* abandoned */
@@ -111,6 +112,7 @@ struct rpc_create_args {
 #define RPC_CLNT_CREATE_ONESHOT		(1UL << 3)
 #define RPC_CLNT_CREATE_NONPRIVPORT	(1UL << 4)
 #define RPC_CLNT_CREATE_NOPING		(1UL << 5)
+#define RPC_CLNT_CREATE_DISCRTRY	(1UL << 6)
 
 struct rpc_clnt *rpc_create(struct rpc_create_args *args);
 struct rpc_clnt	*rpc_bind_new_program(struct rpc_clnt *,
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 393e70a..c21aa0a 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -249,6 +249,8 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
 		clnt->cl_autobind = 1;
 	if (args->flags & RPC_CLNT_CREATE_ONESHOT)
 		clnt->cl_oneshot = 1;
+	if (args->flags & RPC_CLNT_CREATE_DISCRTRY)
+		clnt->cl_discrtry = 1;
 
 	return clnt;
 }
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index cf59f7d..1975139 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -735,6 +735,16 @@ void xprt_transmit(struct rpc_task *task)
 			xprt_reset_majortimeo(req);
 			/* Turn off autodisconnect */
 			del_singleshot_timer_sync(&xprt->timer);
+		} else {
+			/* If all request bytes have been sent,
+			 * then we must be retransmitting this one */
+			if (!req->rq_bytes_sent) {
+				if (task->tk_client->cl_discrtry) {
+					xprt_disconnect(xprt);
+					task->tk_status = -ENOTCONN;
+					return;
+				}
+			}
 		}
 	} else if (!req->rq_bytes_sent)
 		return;

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Success! Was: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-19  2:45                                             ` Trond Myklebust
@ 2007-04-19  4:38                                               ` Florin Iucha
  2007-04-19 15:12                                               ` Chuck Lever
  1 sibling, 0 replies; 43+ messages in thread
From: Florin Iucha @ 2007-04-19  4:38 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Mr. Charles Edward Lever, Andrew Morton, Linus Torvalds,
	Peter Zijlstra, Adrian Bunk, OGAWA Hirofumi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 6011 bytes --]

On Wed, Apr 18, 2007 at 10:45:13PM -0400, Trond Myklebust wrote:
> On Wed, 2007-04-18 at 20:52 -0500, Florin Iucha wrote:
> > It seems that my original problem report had a big mistake!  There is
> > no hang, but at some point the write slows down to a trickle (from
> > 40,000 blocks/s to 22 blocks/s) as can be seen from the iostat log.
> 
> Yeah. You only captured the outgoing traffic to the server, but already
> it looks as if there were 'interesting' things going on. In frames 29346
> to 29350, the traffic stops altogether for 5 seconds (I only see
> keepalives) then it starts up again. Ditto for frames 40477-40482
> (another 5 seconds). ...
> Then at around frame 92072, the client starts to send a bunch of RSTs.
> Aha.... I'll bet that reverting the appended patch fixes the problem.

You win!

Reverting this patch (on top of your previous 5) allowed the big copy
to complete (70GB) as well as successful log-in to gnome!

Acked-By: Florin Iucha <florin@iucha.net>

Thanks so much for the patience with this elusive bug and stubborn
bugreporter!

Regards,
florin

> -------------------------------------------
> commit 43d78ef2ba5bec26d0315859e8324bfc0be23766
> Author: Chuck Lever <chuck.lever@oracle.com>
> Date:   Tue Feb 6 18:26:11 2007 -0500
> 
>     NFS: disconnect before retrying NFSv4 requests over TCP
>     
>     RFC3530 section 3.1.1 states an NFSv4 client MUST NOT send a request
>     twice on the same connection unless it is the NULL procedure.  Section
>     3.1.1 suggests that the client should disconnect and reconnect if it
>     wants to retry a request.
>     
>     Implement this by adding an rpc_clnt flag that an ULP can use to
>     specify that the underlying transport should be disconnected on a
>     major timeout.  The NFSv4 client asserts this new flag, and requests
>     no retries after a minor retransmit timeout.
>     
>     Note that disconnecting on a retransmit is in general not safe to do
>     if the RPC client does not reuse the TCP port number when reconnecting.
>     
>     See http://bugzilla.linux-nfs.org/show_bug.cgi?id=6
>     
>     Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>     Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index a3191f0..c46e94f 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -394,7 +394,8 @@ static void nfs_init_timeout_values(struct rpc_timeout *to, int proto,
>  static int nfs_create_rpc_client(struct nfs_client *clp, int proto,
>  						unsigned int timeo,
>  						unsigned int retrans,
> -						rpc_authflavor_t flavor)
> +						rpc_authflavor_t flavor,
> +						int flags)
>  {
>  	struct rpc_timeout	timeparms;
>  	struct rpc_clnt		*clnt = NULL;
> @@ -407,6 +408,7 @@ static int nfs_create_rpc_client(struct nfs_client *clp, int proto,
>  		.program	= &nfs_program,
>  		.version	= clp->rpc_ops->version,
>  		.authflavor	= flavor,
> +		.flags		= flags,
>  	};
>  
>  	if (!IS_ERR(clp->cl_rpcclient))
> @@ -548,7 +550,7 @@ static int nfs_init_client(struct nfs_client *clp, const struct nfs_mount_data *
>  	 * - RFC 2623, sec 2.3.2
>  	 */
>  	error = nfs_create_rpc_client(clp, proto, data->timeo, data->retrans,
> -			RPC_AUTH_UNIX);
> +					RPC_AUTH_UNIX, 0);
>  	if (error < 0)
>  		goto error;
>  	nfs_mark_client_ready(clp, NFS_CS_READY);
> @@ -868,7 +870,8 @@ static int nfs4_init_client(struct nfs_client *clp,
>  	/* Check NFS protocol revision and initialize RPC op vector */
>  	clp->rpc_ops = &nfs_v4_clientops;
>  
> -	error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour);
> +	error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour,
> +					RPC_CLNT_CREATE_DISCRTRY);
>  	if (error < 0)
>  		goto error;
>  	memcpy(clp->cl_ipaddr, ip_addr, sizeof(clp->cl_ipaddr));
> diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
> index a1be89d..c7a78ee 100644
> --- a/include/linux/sunrpc/clnt.h
> +++ b/include/linux/sunrpc/clnt.h
> @@ -40,6 +40,7 @@ struct rpc_clnt {
>  
>  	unsigned int		cl_softrtry : 1,/* soft timeouts */
>  				cl_intr     : 1,/* interruptible */
> +				cl_discrtry : 1,/* disconnect before retry */
>  				cl_autobind : 1,/* use getport() */
>  				cl_oneshot  : 1,/* dispose after use */
>  				cl_dead     : 1;/* abandoned */
> @@ -111,6 +112,7 @@ struct rpc_create_args {
>  #define RPC_CLNT_CREATE_ONESHOT		(1UL << 3)
>  #define RPC_CLNT_CREATE_NONPRIVPORT	(1UL << 4)
>  #define RPC_CLNT_CREATE_NOPING		(1UL << 5)
> +#define RPC_CLNT_CREATE_DISCRTRY	(1UL << 6)
>  
>  struct rpc_clnt *rpc_create(struct rpc_create_args *args);
>  struct rpc_clnt	*rpc_bind_new_program(struct rpc_clnt *,
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 393e70a..c21aa0a 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -249,6 +249,8 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
>  		clnt->cl_autobind = 1;
>  	if (args->flags & RPC_CLNT_CREATE_ONESHOT)
>  		clnt->cl_oneshot = 1;
> +	if (args->flags & RPC_CLNT_CREATE_DISCRTRY)
> +		clnt->cl_discrtry = 1;
>  
>  	return clnt;
>  }
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index cf59f7d..1975139 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -735,6 +735,16 @@ void xprt_transmit(struct rpc_task *task)
>  			xprt_reset_majortimeo(req);
>  			/* Turn off autodisconnect */
>  			del_singleshot_timer_sync(&xprt->timer);
> +		} else {
> +			/* If all request bytes have been sent,
> +			 * then we must be retransmitting this one */
> +			if (!req->rq_bytes_sent) {
> +				if (task->tk_client->cl_discrtry) {
> +					xprt_disconnect(xprt);
> +					task->tk_status = -ENOTCONN;
> +					return;
> +				}
> +			}
>  		}
>  	} else if (!req->rq_bytes_sent)
>  		return;
> 

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-19  2:45                                             ` Trond Myklebust
  2007-04-19  4:38                                               ` Success! Was: " Florin Iucha
@ 2007-04-19 15:12                                               ` Chuck Lever
  2007-04-19 15:17                                                 ` Trond Myklebust
  1 sibling, 1 reply; 43+ messages in thread
From: Chuck Lever @ 2007-04-19 15:12 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Florin Iucha, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Adrian Bunk, OGAWA Hirofumi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 6687 bytes --]

Trond Myklebust wrote:
> On Wed, 2007-04-18 at 20:52 -0500, Florin Iucha wrote:
>> On Wed, Apr 18, 2007 at 10:11:46AM -0400, Trond Myklebust wrote:
>>> Do you have a copy of wireshark or ethereal on hand? If so, could you
>>> take a look at whether or not any NFS traffic is going between the
>>> client and server once the hang happens?
>> I used the following command 
>>
>>    tcpdump -w nfs-traffic -i eth0 -vv -tt dst port nfs
>>
>> to capture
>>
>>    http://iucha.net/nfs/21-rc7-nfs4/nfs-traffic.bz2
>>
>> I started the capture before starting the copy and left it to run for
>> a few minutes after the traffic slowed to a crawl.
>>
>> The iostat and vmstat are at:
>>
>>    http://iucha.net/nfs/21-rc7-nfs4/iostat
>>    http://iucha.net/nfs/21-rc7-nfs4/vmstat
>>    
>> It seems that my original problem report had a big mistake!  There is
>> no hang, but at some point the write slows down to a trickle (from
>> 40,000 blocks/s to 22 blocks/s) as can be seen from the iostat log.
> 
> Yeah. You only captured the outgoing traffic to the server, but already
> it looks as if there were 'interesting' things going on. In frames 29346
> to 29350, the traffic stops altogether for 5 seconds (I only see
> keepalives) then it starts up again. Ditto for frames 40477-40482
> (another 5 seconds). ...
> Then at around frame 92072, the client starts to send a bunch of RSTs.
> Aha.... I'll bet that reverting the appended patch fixes the problem.
> 
> The assumption Chuck makes is that if _no_ request bytes have been sent,
> yet the request is on the 'receive list' then it must be a resend is
> patently false in the case where the send queue just happens to be full.

There are other places in the RPC client where "zero bytes sent" implies 
that the request has been sent.  The real problem here is that zeroing 
the "bytes sent" field is overloaded.

Perhaps instead of looking at the number of bytes sent, the logic in the 
last hunk of this patch should check which queue the request is sitting on.


> -------------------------------------------
> commit 43d78ef2ba5bec26d0315859e8324bfc0be23766
> Author: Chuck Lever <chuck.lever@oracle.com>
> Date:   Tue Feb 6 18:26:11 2007 -0500
> 
>     NFS: disconnect before retrying NFSv4 requests over TCP
>     
>     RFC3530 section 3.1.1 states an NFSv4 client MUST NOT send a request
>     twice on the same connection unless it is the NULL procedure.  Section
>     3.1.1 suggests that the client should disconnect and reconnect if it
>     wants to retry a request.
>     
>     Implement this by adding an rpc_clnt flag that an ULP can use to
>     specify that the underlying transport should be disconnected on a
>     major timeout.  The NFSv4 client asserts this new flag, and requests
>     no retries after a minor retransmit timeout.
>     
>     Note that disconnecting on a retransmit is in general not safe to do
>     if the RPC client does not reuse the TCP port number when reconnecting.
>     
>     See http://bugzilla.linux-nfs.org/show_bug.cgi?id=6
>     
>     Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>     Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index a3191f0..c46e94f 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -394,7 +394,8 @@ static void nfs_init_timeout_values(struct rpc_timeout *to, int proto,
>  static int nfs_create_rpc_client(struct nfs_client *clp, int proto,
>  						unsigned int timeo,
>  						unsigned int retrans,
> -						rpc_authflavor_t flavor)
> +						rpc_authflavor_t flavor,
> +						int flags)
>  {
>  	struct rpc_timeout	timeparms;
>  	struct rpc_clnt		*clnt = NULL;
> @@ -407,6 +408,7 @@ static int nfs_create_rpc_client(struct nfs_client *clp, int proto,
>  		.program	= &nfs_program,
>  		.version	= clp->rpc_ops->version,
>  		.authflavor	= flavor,
> +		.flags		= flags,
>  	};
>  
>  	if (!IS_ERR(clp->cl_rpcclient))
> @@ -548,7 +550,7 @@ static int nfs_init_client(struct nfs_client *clp, const struct nfs_mount_data *
>  	 * - RFC 2623, sec 2.3.2
>  	 */
>  	error = nfs_create_rpc_client(clp, proto, data->timeo, data->retrans,
> -			RPC_AUTH_UNIX);
> +					RPC_AUTH_UNIX, 0);
>  	if (error < 0)
>  		goto error;
>  	nfs_mark_client_ready(clp, NFS_CS_READY);
> @@ -868,7 +870,8 @@ static int nfs4_init_client(struct nfs_client *clp,
>  	/* Check NFS protocol revision and initialize RPC op vector */
>  	clp->rpc_ops = &nfs_v4_clientops;
>  
> -	error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour);
> +	error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour,
> +					RPC_CLNT_CREATE_DISCRTRY);
>  	if (error < 0)
>  		goto error;
>  	memcpy(clp->cl_ipaddr, ip_addr, sizeof(clp->cl_ipaddr));
> diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
> index a1be89d..c7a78ee 100644
> --- a/include/linux/sunrpc/clnt.h
> +++ b/include/linux/sunrpc/clnt.h
> @@ -40,6 +40,7 @@ struct rpc_clnt {
>  
>  	unsigned int		cl_softrtry : 1,/* soft timeouts */
>  				cl_intr     : 1,/* interruptible */
> +				cl_discrtry : 1,/* disconnect before retry */
>  				cl_autobind : 1,/* use getport() */
>  				cl_oneshot  : 1,/* dispose after use */
>  				cl_dead     : 1;/* abandoned */
> @@ -111,6 +112,7 @@ struct rpc_create_args {
>  #define RPC_CLNT_CREATE_ONESHOT		(1UL << 3)
>  #define RPC_CLNT_CREATE_NONPRIVPORT	(1UL << 4)
>  #define RPC_CLNT_CREATE_NOPING		(1UL << 5)
> +#define RPC_CLNT_CREATE_DISCRTRY	(1UL << 6)
>  
>  struct rpc_clnt *rpc_create(struct rpc_create_args *args);
>  struct rpc_clnt	*rpc_bind_new_program(struct rpc_clnt *,
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 393e70a..c21aa0a 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -249,6 +249,8 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
>  		clnt->cl_autobind = 1;
>  	if (args->flags & RPC_CLNT_CREATE_ONESHOT)
>  		clnt->cl_oneshot = 1;
> +	if (args->flags & RPC_CLNT_CREATE_DISCRTRY)
> +		clnt->cl_discrtry = 1;
>  
>  	return clnt;
>  }
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index cf59f7d..1975139 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -735,6 +735,16 @@ void xprt_transmit(struct rpc_task *task)
>  			xprt_reset_majortimeo(req);
>  			/* Turn off autodisconnect */
>  			del_singleshot_timer_sync(&xprt->timer);
> +		} else {
> +			/* If all request bytes have been sent,
> +			 * then we must be retransmitting this one */
> +			if (!req->rq_bytes_sent) {
> +				if (task->tk_client->cl_discrtry) {
> +					xprt_disconnect(xprt);
> +					task->tk_status = -ENOTCONN;
> +					return;
> +				}
> +			}
>  		}
>  	} else if (!req->rq_bytes_sent)
>  		return;


[-- Attachment #2: chuck.lever.vcf --]
[-- Type: text/x-vcard, Size: 315 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
email;internet:chuck dot lever at nospam oracle dot com
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-19 15:12                                               ` Chuck Lever
@ 2007-04-19 15:17                                                 ` Trond Myklebust
  2007-04-19 15:50                                                   ` Florin Iucha
  0 siblings, 1 reply; 43+ messages in thread
From: Trond Myklebust @ 2007-04-19 15:17 UTC (permalink / raw)
  To: chuck.lever
  Cc: Florin Iucha, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Adrian Bunk, OGAWA Hirofumi, linux-kernel

On Thu, 2007-04-19 at 11:12 -0400, Chuck Lever wrote:
> Perhaps instead of looking at the number of bytes sent, the logic in the 
> last hunk of this patch should check which queue the request is sitting on.

??? It would be a bug for the request to be sitting on _any_ queue when
it enters xprt_transmit().

Here is the patch that I'm currently testing.

Cheers
  Trond
---------------------------
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Thu, 19 Apr 2007 09:55:44 -0400
RPC: Fix the TCP resend semantics for NFSv4

Fix a regression due to the patch "NFS: disconnect before retrying NFSv4
requests over TCP"

The assumption made in xprt_transmit() that the condition
	"req->rq_bytes_sent == 0 and request is on the receive list"
should imply that we're dealing with a retransmission is false.
Firstly, it may simply happen that the socket send queue was full
at the time the request was initially sent through xprt_transmit().
Secondly, doing this for each request that was retransmitted implies
that we disconnect and reconnect for _every_ request that happened to
be retransmitted irrespective of whether or not a disconnection has
already occurred.

Fix is to move this logic into the call_status request timeout handler.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 net/sunrpc/clnt.c |    4 ++++
 net/sunrpc/xprt.c |   10 ----------
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 6d7221f..396cdbe 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1046,6 +1046,8 @@ call_status(struct rpc_task *task)
 		rpc_delay(task, 3*HZ);
 	case -ETIMEDOUT:
 		task->tk_action = call_timeout;
+		if (task->tk_client->cl_discrtry)
+			xprt_disconnect(task->tk_xprt);
 		break;
 	case -ECONNREFUSED:
 	case -ENOTCONN:
@@ -1169,6 +1171,8 @@ call_decode(struct rpc_task *task)
 out_retry:
 	req->rq_received = req->rq_private_buf.len = 0;
 	task->tk_status = 0;
+	if (task->tk_client->cl_discrtry)
+		xprt_disconnect(task->tk_xprt);
 }
 
 /*
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index ee6ffa0..456a145 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -735,16 +735,6 @@ void xprt_transmit(struct rpc_task *task)
 			xprt_reset_majortimeo(req);
 			/* Turn off autodisconnect */
 			del_singleshot_timer_sync(&xprt->timer);
-		} else {
-			/* If all request bytes have been sent,
-			 * then we must be retransmitting this one */
-			if (!req->rq_bytes_sent) {
-				if (task->tk_client->cl_discrtry) {
-					xprt_disconnect(xprt);
-					task->tk_status = -ENOTCONN;
-					return;
-				}
-			}
 		}
 	} else if (!req->rq_bytes_sent)
 		return;

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-19 15:17                                                 ` Trond Myklebust
@ 2007-04-19 15:50                                                   ` Florin Iucha
  2007-04-19 16:09                                                     ` Trond Myklebust
  0 siblings, 1 reply; 43+ messages in thread
From: Florin Iucha @ 2007-04-19 15:50 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: chuck.lever, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Adrian Bunk, OGAWA Hirofumi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1004 bytes --]

On Thu, Apr 19, 2007 at 11:17:28AM -0400, Trond Myklebust wrote:
> On Thu, 2007-04-19 at 11:12 -0400, Chuck Lever wrote:
> > Perhaps instead of looking at the number of bytes sent, the logic in the 
> > last hunk of this patch should check which queue the request is sitting on.
> 
> ??? It would be a bug for the request to be sitting on _any_ queue when
> it enters xprt_transmit().
> 
> Here is the patch that I'm currently testing.

Trond,

What is the set of patches that are you testing?  I'd like to give
that a spin tonight as well.

It is possible that what makes my configuration more susceptible
to the problem is the fact that the client significantly overpowers
the server: Athlon x2 4200+ with 2Gb of RAM for the client vs. PIII
1Ghz 512 MB RAM for the server.  They both have gigabit ethernet
and both NICs and the switch support jumbo frames.

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-19 15:50                                                   ` Florin Iucha
@ 2007-04-19 16:09                                                     ` Trond Myklebust
  2007-04-19 19:58                                                       ` Failure! " Florin Iucha
  0 siblings, 1 reply; 43+ messages in thread
From: Trond Myklebust @ 2007-04-19 16:09 UTC (permalink / raw)
  To: Florin Iucha
  Cc: chuck.lever, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Adrian Bunk, OGAWA Hirofumi, linux-kernel

On Thu, 2007-04-19 at 10:50 -0500, Florin Iucha wrote:
> On Thu, Apr 19, 2007 at 11:17:28AM -0400, Trond Myklebust wrote:
> > On Thu, 2007-04-19 at 11:12 -0400, Chuck Lever wrote:
> > > Perhaps instead of looking at the number of bytes sent, the logic in the 
> > > last hunk of this patch should check which queue the request is sitting on.
> > 
> > ??? It would be a bug for the request to be sitting on _any_ queue when
> > it enters xprt_transmit().
> > 
> > Here is the patch that I'm currently testing.
> 
> Trond,
> 
> What is the set of patches that are you testing?  I'd like to give
> that a spin tonight as well.
> 
> It is possible that what makes my configuration more susceptible
> to the problem is the fact that the client significantly overpowers
> the server: Athlon x2 4200+ with 2Gb of RAM for the client vs. PIII
> 1Ghz 512 MB RAM for the server.  They both have gigabit ethernet
> and both NICs and the switch support jumbo frames.
> 
> Regards,
> florin
> 

See
   http://client.linux-nfs.org/Linux-2.6.x/2.6.21-rc7/

I'm giving the first 5 patches of that series (i.e.
linux-2.6.21-001-cleanup_unstable_write.dif to
linux-2.6.21-005-fix_nfsv4_resend.dif) an extra beating since those are
the ones that I feel should go into 2.6.21 final in order to fix the
read/write regressions that have been reported. They should be identical
to the patches that I posted on lkml in the past 3 days.

Please feel free to grab them and give them a test.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Failure! Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-19 16:09                                                     ` Trond Myklebust
@ 2007-04-19 19:58                                                       ` Florin Iucha
  2007-04-19 21:30                                                         ` Trond Myklebust
  0 siblings, 1 reply; 43+ messages in thread
From: Florin Iucha @ 2007-04-19 19:58 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: chuck.lever, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Adrian Bunk, OGAWA Hirofumi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1167 bytes --]

On Thu, Apr 19, 2007 at 12:09:42PM -0400, Trond Myklebust wrote:
> See
>    http://client.linux-nfs.org/Linux-2.6.x/2.6.21-rc7/
> 
> I'm giving the first 5 patches of that series (i.e.
> linux-2.6.21-001-cleanup_unstable_write.dif to
> linux-2.6.21-005-fix_nfsv4_resend.dif) an extra beating since those are
> the ones that I feel should go into 2.6.21 final in order to fix the
> read/write regressions that have been reported. They should be identical
> to the patches that I posted on lkml in the past 3 days.
> 
> Please feel free to grab them and give them a test.

The copy completed some time ago, but now I cannot ssh into the box!
This is a new development, as before I was always able to ssh into,
even when the copy slowed down to a trickle.

I'm far from the machine right now, so I will do some more tests
tonight, but right now, the new patchset is not good.  What is the
difference between reverting the patch you sent yesterday and your
current fifth patch?  I assume the other four are identical, right?

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Failure! Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-19 19:58                                                       ` Failure! " Florin Iucha
@ 2007-04-19 21:30                                                         ` Trond Myklebust
  2007-04-19 21:49                                                           ` Florin Iucha
  0 siblings, 1 reply; 43+ messages in thread
From: Trond Myklebust @ 2007-04-19 21:30 UTC (permalink / raw)
  To: Florin Iucha
  Cc: chuck.lever, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Adrian Bunk, OGAWA Hirofumi, linux-kernel

On Thu, 2007-04-19 at 14:58 -0500, Florin Iucha wrote:
> On Thu, Apr 19, 2007 at 12:09:42PM -0400, Trond Myklebust wrote:
> > See
> >    http://client.linux-nfs.org/Linux-2.6.x/2.6.21-rc7/
> > 
> > I'm giving the first 5 patches of that series (i.e.
> > linux-2.6.21-001-cleanup_unstable_write.dif to
> > linux-2.6.21-005-fix_nfsv4_resend.dif) an extra beating since those are
> > the ones that I feel should go into 2.6.21 final in order to fix the
> > read/write regressions that have been reported. They should be identical
> > to the patches that I posted on lkml in the past 3 days.
> > 
> > Please feel free to grab them and give them a test.
> 
> The copy completed some time ago, but now I cannot ssh into the box!
> This is a new development, as before I was always able to ssh into,
> even when the copy slowed down to a trickle.
> 
> I'm far from the machine right now, so I will do some more tests
> tonight, but right now, the new patchset is not good.  What is the
> difference between reverting the patch you sent yesterday and your
> current fifth patch?  I assume the other four are identical, right?

The only difference is the way in which we handle retries of an NFSv4
request: the new patch disconnects if and only if a timeout has
occurred, or the server sends us garbage.

Trond

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Failure! Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-19 21:30                                                         ` Trond Myklebust
@ 2007-04-19 21:49                                                           ` Florin Iucha
  2007-04-20 13:30                                                             ` Success! Was: " Florin Iucha
  0 siblings, 1 reply; 43+ messages in thread
From: Florin Iucha @ 2007-04-19 21:49 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: chuck.lever, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Adrian Bunk, OGAWA Hirofumi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 954 bytes --]

On Thu, Apr 19, 2007 at 05:30:42PM -0400, Trond Myklebust wrote:
> > I'm far from the machine right now, so I will do some more tests
> > tonight, but right now, the new patchset is not good.  What is the
> > difference between reverting the patch you sent yesterday and your
> > current fifth patch?  I assume the other four are identical, right?
> 
> The only difference is the way in which we handle retries of an NFSv4
> request: the new patch disconnects if and only if a timeout has
> occurred, or the server sends us garbage.

I have to mention that I rebased to the head of the tree
(895e1fc7226e6732bc77138955b6c7dfa279f57a) before applying your
patches, in order to test what I expect the official tree to be.

Tonight I'll test this kernel once more, then go back to 21-rc7 and
apply your 5 patches and re-test.

florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Success! Was: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-19 21:49                                                           ` Florin Iucha
@ 2007-04-20 13:30                                                             ` Florin Iucha
  2007-04-20 13:37                                                               ` Trond Myklebust
  0 siblings, 1 reply; 43+ messages in thread
From: Florin Iucha @ 2007-04-20 13:30 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: chuck.lever, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Adrian Bunk, OGAWA Hirofumi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1493 bytes --]

On Thu, Apr 19, 2007 at 04:49:31PM -0500, Florin Iucha wrote:
> On Thu, Apr 19, 2007 at 05:30:42PM -0400, Trond Myklebust wrote:
> > > I'm far from the machine right now, so I will do some more tests
> > > tonight, but right now, the new patchset is not good.  What is the
> > > difference between reverting the patch you sent yesterday and your
> > > current fifth patch?  I assume the other four are identical, right?
> > 
> > The only difference is the way in which we handle retries of an NFSv4
> > request: the new patch disconnects if and only if a timeout has
> > occurred, or the server sends us garbage.
> 
> I have to mention that I rebased to the head of the tree
> (895e1fc7226e6732bc77138955b6c7dfa279f57a) before applying your
> patches, in order to test what I expect the official tree to be.
> 
> Tonight I'll test this kernel once more, then go back to 21-rc7 and
> apply your 5 patches and re-test.

It passed big-copy, and the copy run from the gnome-session while I
did my morning light browsing, email reading, etc.

kernel:
   895e1fc7226e6732bc77138955b6c7dfa279f57a

patches:
   linux-2.6.21-001-cleanup_unstable_write.dif
   linux-2.6.21-002-defer_clearing_pg_writeback.dif
   linux-2.6.21-003-fix_desynchronised_ncommit.dif
   linux-2.6.21-004-fix_nfs_set_page_dirty.dif
   linux-2.6.21-005-fix_nfsv4_resend.dif

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Success! Was: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-20 13:30                                                             ` Success! Was: " Florin Iucha
@ 2007-04-20 13:37                                                               ` Trond Myklebust
  2007-04-20 13:51                                                                 ` Florin Iucha
  0 siblings, 1 reply; 43+ messages in thread
From: Trond Myklebust @ 2007-04-20 13:37 UTC (permalink / raw)
  To: Florin Iucha
  Cc: chuck.lever, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Adrian Bunk, OGAWA Hirofumi, linux-kernel

On Fri, 2007-04-20 at 08:30 -0500, Florin Iucha wrote:
> On Thu, Apr 19, 2007 at 04:49:31PM -0500, Florin Iucha wrote:
> > On Thu, Apr 19, 2007 at 05:30:42PM -0400, Trond Myklebust wrote:
> > > > I'm far from the machine right now, so I will do some more tests
> > > > tonight, but right now, the new patchset is not good.  What is the
> > > > difference between reverting the patch you sent yesterday and your
> > > > current fifth patch?  I assume the other four are identical, right?
> > > 
> > > The only difference is the way in which we handle retries of an NFSv4
> > > request: the new patch disconnects if and only if a timeout has
> > > occurred, or the server sends us garbage.
> > 
> > I have to mention that I rebased to the head of the tree
> > (895e1fc7226e6732bc77138955b6c7dfa279f57a) before applying your
> > patches, in order to test what I expect the official tree to be.
> > 
> > Tonight I'll test this kernel once more, then go back to 21-rc7 and
> > apply your 5 patches and re-test.
> 
> It passed big-copy, and the copy run from the gnome-session while I
> did my morning light browsing, email reading, etc.
> 
> kernel:
>    895e1fc7226e6732bc77138955b6c7dfa279f57a
> 
> patches:
>    linux-2.6.21-001-cleanup_unstable_write.dif
>    linux-2.6.21-002-defer_clearing_pg_writeback.dif
>    linux-2.6.21-003-fix_desynchronised_ncommit.dif
>    linux-2.6.21-004-fix_nfs_set_page_dirty.dif
>    linux-2.6.21-005-fix_nfsv4_resend.dif
> 
> Regards,
> florin

Thanks! Did you ever find out what had happened to the test that hung
last night?

Cheers
  Trond

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Success! Was: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-20 13:37                                                               ` Trond Myklebust
@ 2007-04-20 13:51                                                                 ` Florin Iucha
  0 siblings, 0 replies; 43+ messages in thread
From: Florin Iucha @ 2007-04-20 13:51 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: chuck.lever, Andrew Morton, Linus Torvalds, Peter Zijlstra,
	Adrian Bunk, OGAWA Hirofumi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 462 bytes --]

On Fri, Apr 20, 2007 at 09:37:30AM -0400, Trond Myklebust wrote:
> Thanks! Did you ever find out what had happened to the test that hung
> last night?

Nope.  I could not ssh into it and the machine was needed for some
windows duty before I got home ;)  I'll try again this coming week-end
and let you know if I see any problems.

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-18  5:37                                 ` Andrew Morton
  2007-04-18 12:38                                   ` Florin Iucha
@ 2007-04-29 19:41                                   ` Rogier Wolff
  2007-04-29 20:09                                     ` Peter Zijlstra
  1 sibling, 1 reply; 43+ messages in thread
From: Rogier Wolff @ 2007-04-29 19:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Florin Iucha, Trond Myklebust, Peter Zijlstra,
	Adrian Bunk, OGAWA Hirofumi, linux-kernel

On Tue, Apr 17, 2007 at 10:37:38PM -0700, Andrew Morton wrote:
> Florin, can we please see /proc/meminfo as well?
> 
> Also the result of `echo m > /proc/sysrq-trigger'

Hi,

It's been a while since this thread died out, but maybe I'm 
having the same problem. Networking, large part of memory is 
buffering writes..... 

In my case I'm using NBD. 

Oh, 

/sys/block/nbd0/stat gives:
     636       88     5353     1700      991    19554   162272    63156       43  1452000 61802352
I put some debugging stuff in nbd, and it DOES NOT KNOW about the
43 requests that the io scheduler claims are in flight at the
driver.... 

Those requests start a couple of seconds AFTER the whole thing
grinds to a halt.

I switched from crashing my 512Mb-ram-workstation to my development
machine, which has only 64M of RAM. (I got the development machine
back up and running after some effort). 

My rsync (and also "sync" if I call it, or reboot without -n) also
gets stuck in D state: 

<4>[  622.364000] rsync         D 0019C170     0  2456   2455 (NOTLB)
<4>[  622.364000]        c04d7c80 00000086 c1f61ba8 0019c170 00000000 00000008 c31048a0 0003382e 
<4>[  622.364000]        00000000 c24c9908 00000286 c1092740 c3e12590 c2330a50 c2330b5c 00061a80 
<4>[  622.364000]        8639a400 00000078 c04d7cd0 00000000 c10801b8 c04d7c88 c03082af c0176c48 
<4>[  622.364000] Call Trace:
<4>[  622.364000]  [<c03082af>] io_schedule+0xe/0x16
<4>[  622.364000]  [<c0176c48>] sync_buffer+0x0/0x2e
<4>[  622.364000]  [<c0176c73>] sync_buffer+0x2b/0x2e
<4>[  622.364000]  [<c03083b9>] __wait_on_bit+0x2c/0x51
<4>[  622.364000]  [<c0176c48>] sync_buffer+0x0/0x2e
<4>[  622.364000]  [<c0308451>] out_of_line_wait_on_bit+0x73/0x7b
<4>[  622.364000]  [<c012970e>] wake_bit_function+0x0/0x3c
<4>[  622.364000]  [<c012970e>] wake_bit_function+0x0/0x3c
<4>[  622.364000]  [<c0176cce>] __wait_on_buffer+0x22/0x25
<4>[  622.364000]  [<c0198cf0>] ext3_find_entry+0x1aa/0x36f
<4>[  622.364000]  [<c01a2324>] journal_dirty_metadata+0x1b6/0x1d3
<4>[  622.364000]  [<c01990e7>] ext3_lookup+0x28/0xc6
<4>[  622.364000]  [<c0161611>] real_lookup+0x53/0xc2
<4>[  622.364000]  [<c0161881>] do_lookup+0x57/0x9d
<4>[  622.364000]  [<c0162075>] __link_path_walk+0x7ae/0xb81
<4>[  622.364000]  [<c011cb77>] __do_softirq+0x57/0x83
<4>[  622.364000]  [<c0162485>] link_path_walk+0x3d/0xa0
<4>[  622.364000]  [<c015a4e7>] sys_lchown+0x3c/0x44
<4>[  622.364000]  [<c015a8c7>] get_unused_fd+0xa0/0xbc
<4>[  622.364000]  [<c0162840>] do_path_lookup+0x1b7/0x200
<4>[  622.364000]  [<c01628e1>] __path_lookup_intent_open+0x42/0x72
<4>[  622.364000]  [<c0162931>] path_lookup_open+0x20/0x25
<4>[  622.364000]  [<c0163026>] open_namei+0x8c/0x532
<4>[  622.364000]  [<c015a328>] sys_fchmodat+0xac/0xb9
<4>[  622.364000]  [<c015a71b>] do_filp_open+0x25/0x39
<4>[  622.364000]  [<c015a4e7>] sys_lchown+0x3c/0x44
<4>[  622.364000]  [<c015a8c7>] get_unused_fd+0xa0/0xbc
<4>[  622.364000]  [<c015a9d5>] do_sys_open+0x42/0xbe
<4>[  622.364000]  [<c015aa6b>] sys_open+0x1a/0x1c
<4>[  622.364000]  [<c0103dbc>] syscall_call+0x7/0xb
<4>[  622.364000]  =======================

----------------------
<6>[  871.520000] SysRq : Show Memory
<6>[  871.520000] Mem-info:
<4>[  871.520000] DMA per-cpu:
<4>[  871.520000] CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
<4>[  871.520000] Normal per-cpu:
<4>[  871.520000] CPU    0: Hot: hi:    6, btch:   1 usd:   0   Cold: hi:    2, btch:   1 usd:   0
<4>[  871.520000] Active:5632 inactive:6764 dirty:0 writeback:302 unstable:0
<4>[  871.520000]  free:717 slab:2024 mapped:926 pagetables:135 bounce:0
<4>[  871.520000] DMA free:1104kB min:252kB low:312kB high:376kB active:3600kB inactive:6820kB present:16256kB pages_scanned:0 all_unreclaimable? no
<4>[  871.520000] lowmem_reserve[]: 0 47
<4>[  871.520000] Normal free:1764kB min:760kB low:948kB high:1140kB active:18928kB inactive:20236kB present:48708kB pages_scanned:0 all_unreclaimable? no
<4>[  871.520000] lowmem_reserve[]: 0 0
<4>[  871.520000] DMA: 118*4kB 19*8kB 2*16kB 2*32kB 0*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1104kB
<4>[  871.520000] Normal: 171*4kB 23*8kB 0*16kB 0*32kB 4*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1764kB
<4>[  871.520000] Swap cache: add 0, delete 0, find 0/0, race 0+0
<4>[  871.520000] Free swap  = 0kB
<4>[  871.520000] Total swap = 0kB
<6>[  871.520000] Free swap:            0kB
<6>[  871.520000] 16368 pages of RAM
<6>[  871.520000] 0 pages of HIGHMEM
<6>[  871.520000] 1044 reserved pages
<6>[  871.520000] 13456 pages shared
<6>[  871.520000] 0 pages swap cached
<6>[  871.520000] 0 pages dirty
<6>[  871.520000] 302 pages writeback
<6>[  871.520000] 926 pages mapped
<6>[  871.520000] 2024 pages slab
<6>[  871.520000] 135 pages pagetables

----------------------
ozon:/home/wolff# cat /proc/meminfo 
MemTotal:        61296 kB
MemFree:          2752 kB
Buffers:          2228 kB
Cached:          29968 kB
SwapCached:          0 kB
Active:          22632 kB
Inactive:        27056 kB
SwapTotal:           0 kB
SwapFree:            0 kB
Dirty:               0 kB
Writeback:        1208 kB
AnonPages:       17512 kB
Mapped:           3704 kB
Slab:             8088 kB
SReclaimable:     3656 kB
SUnreclaim:       4432 kB
PageTables:        552 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:     30648 kB
Committed_AS:    50092 kB
VmallocTotal:   974548 kB
VmallocUsed:       208 kB
VmallocChunk:   974160 kB



-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement. 
Does it sit on the couch all day? Is it unemployed? Please be specific! 
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues
  2007-04-29 19:41                                   ` Rogier Wolff
@ 2007-04-29 20:09                                     ` Peter Zijlstra
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2007-04-29 20:09 UTC (permalink / raw)
  To: Rogier Wolff
  Cc: Andrew Morton, Linus Torvalds, Florin Iucha, Trond Myklebust,
	Adrian Bunk, OGAWA Hirofumi, linux-kernel

On Sun, 2007-04-29 at 21:41 +0200, Rogier Wolff wrote:
> On Tue, Apr 17, 2007 at 10:37:38PM -0700, Andrew Morton wrote:
> > Florin, can we please see /proc/meminfo as well?
> > 
> > Also the result of `echo m > /proc/sysrq-trigger'
> 
> Hi,
> 
> It's been a while since this thread died out, but maybe I'm 
> having the same problem. Networking, large part of memory is 
> buffering writes..... 
> 
> In my case I'm using NBD. 
> 
> Oh, 
> 
> /sys/block/nbd0/stat gives:
>      636       88     5353     1700      991    19554   162272    63156       43  1452000 61802352
> I put some debugging stuff in nbd, and it DOES NOT KNOW about the
> 43 requests that the io scheduler claims are in flight at the
> driver.... 

AFAIK nbd is a tad broken; the following patch used to fix it, although
not in the proper way. Hence it never got merged.

There is a race where the plug state of the device queue gets confused,
which causes requests to just sit on the queue, without further action.

---

Subject: nbd: request_fn fixup

Dropping the queue_lock opens up a nasty race, fix this race by
plugging the device when we're done.

Also includes a small cleanup.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Daniel Phillips <phillips@google.com>
CC: Pavel Machek <pavel@ucw.cz>
---
 drivers/block/nbd.c |   67 ++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 49 insertions(+), 18 deletions(-)

Index: linux-2.6/drivers/block/nbd.c
===================================================================
--- linux-2.6.orig/drivers/block/nbd.c	2006-09-07 17:20:52.000000000 +0200
+++ linux-2.6/drivers/block/nbd.c	2006-09-07 17:35:05.000000000 +0200
@@ -97,20 +97,24 @@ static const char *nbdcmd_to_ascii(int c
 }
 #endif /* NDEBUG */
 
-static void nbd_end_request(struct request *req)
+static void __nbd_end_request(struct request *req)
 {
 	int uptodate = (req->errors == 0) ? 1 : 0;
-	request_queue_t *q = req->q;
-	unsigned long flags;
 
 	dprintk(DBG_BLKDEV, "%s: request %p: %s\n", req->rq_disk->disk_name,
 			req, uptodate? "done": "failed");
 
-	spin_lock_irqsave(q->queue_lock, flags);
-	if (!end_that_request_first(req, uptodate, req->nr_sectors)) {
+	if (!end_that_request_first(req, uptodate, req->nr_sectors))
 		end_that_request_last(req, uptodate);
-	}
-	spin_unlock_irqrestore(q->queue_lock, flags);
+}
+
+static void nbd_end_request(struct request *req)
+{
+	request_queue_t *q = req->q;
+
+	spin_lock_irq(q->queue_lock);
+	__nbd_end_request(req);
+	spin_unlock_irq(q->queue_lock);
 }
 
 /*
@@ -435,10 +439,8 @@ static void do_nbd_request(request_queue
 			mutex_unlock(&lo->tx_lock);
 			printk(KERN_ERR "%s: Attempted send on closed socket\n",
 			       lo->disk->disk_name);
-			req->errors++;
-			nbd_end_request(req);
 			spin_lock_irq(q->queue_lock);
-			continue;
+			goto error_out;
 		}
 
 		lo->active_req = req;
@@ -463,10 +465,13 @@ static void do_nbd_request(request_queue
 
 error_out:
 		req->errors++;
-		spin_unlock(q->queue_lock);
-		nbd_end_request(req);
-		spin_lock(q->queue_lock);
+		__nbd_end_request(req);
 	}
+	/*
+	 * q->queue_lock has been dropped, this opens up a race
+	 * plug the device to close it.
+	 */
+	blk_plug_device(q);
 	return;
 }
 



^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2007-04-29 20:09 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20070416125905.GA2769@iucha.net>
     [not found] ` <1176736734.6761.45.camel@heimdal.trondhjem.org>
     [not found]   ` <Pine.LNX.4.64.0704160904560.5473@woody.linux-foundation.org>
     [not found]     ` <1176740307.6761.56.camel@heimdal.trondhjem.org>
     [not found]       ` <1176741408.6761.62.camel@heimdal.trondhjem.org>
     [not found]         ` <1176792399.3035.30.camel@twins>
     [not found]           ` <1176796503.3035.33.camel@twins>
2007-04-17 17:01             ` nfs: desynchronized value of nfs_i.ncommit OGAWA Hirofumi
2007-04-17 22:44               ` Trond Myklebust
2007-04-18  1:19               ` [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues Trond Myklebust
2007-04-18  1:29                 ` [PATCH 1/4] NFS: clean up the unstable write code Trond Myklebust
2007-04-18  1:29                 ` [PATCH 2/4] NFS: Don't clear PG_writeback until after we've processed unstable writes Trond Myklebust
2007-04-18  1:29                 ` [PATCH 3/4] NFS: Fix the 'desynchronized value of nfs_i.ncommit' error Trond Myklebust
2007-04-18  1:29                 ` [PATCH 4/4] NFS: Fix race in nfs_set_page_dirty Trond Myklebust
2007-04-18  2:58                 ` [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues Andrew Morton
2007-04-18  3:06                   ` Trond Myklebust
2007-04-18  3:30                     ` Florin Iucha
2007-04-18  3:54                       ` Trond Myklebust
2007-04-18  4:07                         ` Florin Iucha
2007-04-18  4:13                           ` Andrew Morton
2007-04-18  4:30                             ` Florin Iucha
2007-04-18  5:14                               ` Linus Torvalds
2007-04-18  5:26                                 ` Florin Iucha
2007-04-18  5:37                                 ` Andrew Morton
2007-04-18 12:38                                   ` Florin Iucha
2007-04-18 13:15                                     ` Trond Myklebust
2007-04-18 13:42                                       ` Florin Iucha
2007-04-18 14:11                                         ` Trond Myklebust
2007-04-18 14:17                                           ` Florin Iucha
2007-04-18 14:19                                             ` Trond Myklebust
2007-04-19  1:52                                           ` Florin Iucha
2007-04-19  2:45                                             ` Trond Myklebust
2007-04-19  4:38                                               ` Success! Was: " Florin Iucha
2007-04-19 15:12                                               ` Chuck Lever
2007-04-19 15:17                                                 ` Trond Myklebust
2007-04-19 15:50                                                   ` Florin Iucha
2007-04-19 16:09                                                     ` Trond Myklebust
2007-04-19 19:58                                                       ` Failure! " Florin Iucha
2007-04-19 21:30                                                         ` Trond Myklebust
2007-04-19 21:49                                                           ` Florin Iucha
2007-04-20 13:30                                                             ` Success! Was: " Florin Iucha
2007-04-20 13:37                                                               ` Trond Myklebust
2007-04-20 13:51                                                                 ` Florin Iucha
2007-04-18 14:14                                         ` Florin Iucha
2007-04-29 19:41                                   ` Rogier Wolff
2007-04-29 20:09                                     ` Peter Zijlstra
2007-04-18 11:38                           ` Trond Myklebust
2007-04-18  9:54                     ` OGAWA Hirofumi
2007-04-18  8:19                 ` Peter Zijlstra
2007-04-18 16:41                   ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.