* Re: NFS client errors with 2.5.74? [not found] ` <16141.15245.367725.364913@charged.uio.no> @ 2003-07-10 15:04 ` Jamie Lokier [not found] ` <20030710150012.GA29113@mail.jlokier.co.uk> 1 sibling, 0 replies; 12+ messages in thread From: Jamie Lokier @ 2003-07-10 15:04 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-kernel > > kernel: nfs: server 192.168.1.1 not responding, timed out Trond Myklebust wrote: > I can never guarantee you perfect service with soft mounts (a 5 second > network patition/server congestion is all it takes) There is _no_ congestion to speak of: I type "ls directory" and it returns EIO apparently immediately, from an otherwise idle network and unloaded server. The "server 192.168.1.1 not responding, timed out" message also appears immediately in this case - there is no 5 second delay. There is no 0.7 second delay either (the default value of "timeo" according to nfs(5)). So the retransmission logic is buggered. > Sigh... I hate soft mounts... Have I said that before? 8-) If I switch to a hard mount ("hard,intr") the EIO errors go away. However, the protocol problem remains: multiple READDIRPLUS calls with the same xid in a fraction of a second. Note: there is no 0.7 second delay between these packets. According to Ethereal, it is between 0.01 and 0.1 seconds between duplicate requests. There seems to be a transition from a state where calls with duplicate xids are rare (but they do occur), to one where they occur on nearly every request. I have 768MB of RAM on the client, so I checked whether RAM being filled makes a difference. Not really. After mounting, if I do "ls -lR" then I see that duplicate xids are rare for a while, then they become common. In this state, I still have 400MB free (i.e. not even filled clean pagecache pages), so it is not an absolute shortage of RAM which triggers this, but something else. I suspect the request timeout logic is buggered, and sending retries too quickly - 0.01 to 0.1 seconds rather than 0.7 upwards. It would also explain why "soft" is failing quickly: if the timeout logic thinks it has already sent the maximum number of retries in a very short time, it will count it as a timeout even though the server is quite fast in responding. It's interesting that this state can be reached even when the network, client and server are idle and I try "ls directory" for some uncached directory. This shows it's not purely a question of congestion, but that even a fast response is not good enough. -- Jamie ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <20030710150012.GA29113@mail.jlokier.co.uk>]
[parent not found: <16141.32852.39625.891724@charged.uio.no>]
* Re: NFS client errors with 2.5.74? [not found] ` <16141.32852.39625.891724@charged.uio.no> @ 2003-07-10 15:35 ` Jamie Lokier 2003-07-10 15:45 ` Trond Myklebust 2003-07-10 23:36 ` Trond Myklebust 0 siblings, 2 replies; 12+ messages in thread From: Jamie Lokier @ 2003-07-10 15:35 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-kernel Trond Myklebust wrote: > >>>>> " " == Jamie Lokier <jamie@shareable.org> writes: > > > > There is no 0.7 second delay either (the default value of > > "timeo" according to nfs(5)). So the retransmission logic is > > buggered. > > No. manpage is buggered. We use round trip time estimation now... > > > > However, the protocol problem remains: multiple READDIRPLUS > > calls with the same xid in a fraction of a second. Note: there > > is no 0.7 second delay between these packets. According to > > Ethereal, it is between > > 0.01 and 0.1 seconds between duplicate requests. > > Yup. That's what the RTO does... So the problem state arises when the round trip time converges on some value which is too small? And in this state, "soft" returns EIO within some ridiculously small timeout? It is definitely too small because I see that the bulk of readdir requests are fine for the first few thousand, but nearly all of them have one or more duplicates after that. There is no justification for those duplicates - all the original requests are getting replies. It's possible that the server is taking longer to respond to READDIRPLUS than to GETATTR. That makes sense, because the READDIRPLUS reads inodes from disk, then GETATTR just reads them from memory. Let's take a look... Yup! In the traces, GETATTR responds consistently in 0.0002 seconds (close to ping time, clearly no disk access), whereas READDIRPLUS takes some 0.2 seconds to respond. I see that the RTT estimation _is_ adapting quickly to this: the first READDIRPLUS in a sequence has several duplicates; subsequent ones don't. Suggestion: - With RTT estimation, "not responding" should require a certain amount of absolute time to pass, e.g. 5 seconds, not just a fixed number of packets. Without this, "soft" is broken because the time difference between a response for cached data and a response from the disk is a factor of several hundred. Note that I'm looking as "ls -R" now because it's easy to repeat, but the EIO problem occurs with file reading and writing too. -- Jamie p.s Another quirk is I see the server sending more than one duplicate reply to each duplicate READDIRPLUS request... Well, that's a separate problem I think :) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS client errors with 2.5.74? 2003-07-10 15:35 ` Jamie Lokier @ 2003-07-10 15:45 ` Trond Myklebust 2003-07-10 23:36 ` Trond Myklebust 1 sibling, 0 replies; 12+ messages in thread From: Trond Myklebust @ 2003-07-10 15:45 UTC (permalink / raw) To: Jamie Lokier; +Cc: Trond Myklebust, linux-kernel >>>>> " " == Jamie Lokier <jamie@shareable.org> writes: > all the original requests are getting replies. OK. That *is* definitely a bug... Let me ponder a bit. I'm working through that logic right now... Cheers, Trond ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS client errors with 2.5.74? 2003-07-10 15:35 ` Jamie Lokier 2003-07-10 15:45 ` Trond Myklebust @ 2003-07-10 23:36 ` Trond Myklebust 2003-07-12 15:13 ` Jamie Lokier 1 sibling, 1 reply; 12+ messages in thread From: Trond Myklebust @ 2003-07-10 23:36 UTC (permalink / raw) To: Jamie Lokier; +Cc: linux-kernel [-- Attachment #1: message body text --] [-- Type: text/plain, Size: 583 bytes --] OK... I have two patches for you... (I've attached them as plaintext MIME thingmajigs)... The first one should fix the problem of the kernel missing replies while we are busy trying to resend a request. The second, solves a problem of resource starvation. The fact that we can currently just submit arbitrary numbers of asynchronous requests means that we can exhaust resources to the point where the socket starts dropping replies. This patch limits the number of outstanding asynchronous requests to 16 per socket (the maximum number of xprt/transport slots). Cheers, Trond [-- Attachment #2: patch 1/2 --] [-- Type: text/plain, Size: 6341 bytes --] diff -u --recursive --new-file linux-2.5.74/include/linux/sunrpc/xprt.h linux-2.5.74-13-resends/include/linux/sunrpc/xprt.h --- linux-2.5.74/include/linux/sunrpc/xprt.h 2003-07-10 22:52:35.000000000 +0200 +++ linux-2.5.74-13-resends/include/linux/sunrpc/xprt.h 2003-07-10 22:53:45.000000000 +0200 @@ -111,7 +111,7 @@ unsigned long rq_xtime; /* when transmitted */ int rq_ntimeo; - int rq_nresend; + int rq_ntrans; }; #define rq_svec rq_snd_buf.head #define rq_slen rq_snd_buf.len diff -u --recursive --new-file linux-2.5.74/net/sunrpc/clnt.c linux-2.5.74-13-resends/net/sunrpc/clnt.c --- linux-2.5.74/net/sunrpc/clnt.c 2003-06-12 04:22:40.000000000 +0200 +++ linux-2.5.74-13-resends/net/sunrpc/clnt.c 2003-07-10 21:43:21.000000000 +0200 @@ -658,7 +658,7 @@ if (task->tk_status < 0) return; task->tk_status = xprt_prepare_transmit(task); - if (task->tk_status < 0) + if (task->tk_status != 0) return; /* Encode here so that rpcsec_gss can use correct sequence number. */ if (!task->tk_rqstp->rq_bytes_sent) @@ -684,7 +684,7 @@ struct rpc_rqst *req = task->tk_rqstp; int status; - if (req->rq_received != 0) + if (req->rq_received != 0 && !req->rq_bytes_sent) task->tk_status = req->rq_received; dprintk("RPC: %4d call_status (status %d)\n", @@ -743,7 +743,7 @@ dprintk("RPC: %4d call_timeout (major)\n", task->tk_pid); if (clnt->cl_softrtry) { - if (clnt->cl_chatty && !task->tk_exit) + if (clnt->cl_chatty) printk(KERN_NOTICE "%s: server %s not responding, timed out\n", clnt->cl_protname, clnt->cl_server); rpc_exit(task, -EIO); @@ -786,8 +786,9 @@ if (task->tk_status < 12) { if (!clnt->cl_softrtry) { - task->tk_action = call_transmit; + task->tk_action = call_bind; clnt->cl_stats->rpcretrans++; + req->rq_received = 0; } else { printk(KERN_WARNING "%s: too small RPC reply size (%d bytes)\n", clnt->cl_protname, task->tk_status); @@ -797,8 +798,10 @@ } /* Verify the RPC header */ - if (!(p = call_verify(task))) + if (!(p = call_verify(task))) { + req->rq_received = 0; return; + } /* * The following is an NFS-specific hack to cater for setuid diff -u --recursive --new-file linux-2.5.74/net/sunrpc/xprt.c linux-2.5.74-13-resends/net/sunrpc/xprt.c --- linux-2.5.74/net/sunrpc/xprt.c 2003-06-10 06:26:58.000000000 +0200 +++ linux-2.5.74-13-resends/net/sunrpc/xprt.c 2003-07-10 22:53:36.000000000 +0200 @@ -140,15 +140,21 @@ static int __xprt_lock_write(struct rpc_xprt *xprt, struct rpc_task *task) { + struct rpc_rqst *req = task->tk_rqstp; if (!xprt->snd_task) { - if (xprt->nocong || __xprt_get_cong(xprt, task)) + if (xprt->nocong || __xprt_get_cong(xprt, task)) { xprt->snd_task = task; + if (req) { + req->rq_bytes_sent = 0; + req->rq_ntrans++; + } + } } if (xprt->snd_task != task) { dprintk("RPC: %4d TCP write queue full\n", task->tk_pid); task->tk_timeout = 0; task->tk_status = -EAGAIN; - if (task->tk_rqstp && task->tk_rqstp->rq_nresend) + if (req && req->rq_ntrans) rpc_sleep_on(&xprt->resend, task, NULL, NULL); else rpc_sleep_on(&xprt->sending, task, NULL, NULL); @@ -183,8 +189,14 @@ if (!task) return; } - if (xprt->nocong || __xprt_get_cong(xprt, task)) + if (xprt->nocong || __xprt_get_cong(xprt, task)) { + struct rpc_rqst *req = task->tk_rqstp; xprt->snd_task = task; + if (req) { + req->rq_bytes_sent = 0; + req->rq_ntrans++; + } + } } /* @@ -424,6 +436,9 @@ if (xprt_connected(xprt)) goto out_write; + if (task->tk_rqstp) + task->tk_rqstp->rq_bytes_sent = 0; + /* * We're here because the xprt was marked disconnected. * Start by resetting any existing state. @@ -567,7 +582,7 @@ if (!xprt->nocong) { xprt_adjust_cwnd(xprt, copied); __xprt_put_cong(xprt, req); - if (!req->rq_nresend) { + if (req->rq_ntrans == 1) { unsigned timer = task->tk_msg.rpc_proc->p_timer; if (timer) @@ -1075,8 +1090,8 @@ } rpc_inc_timeo(&task->tk_client->cl_rtt); xprt_adjust_cwnd(req->rq_xprt, -ETIMEDOUT); + __xprt_put_cong(xprt, req); } - req->rq_nresend++; dprintk("RPC: %4d xprt_timer (%s request)\n", task->tk_pid, req ? "pending" : "backlogged"); @@ -1109,6 +1124,11 @@ rpc_remove_wait_queue(task); spin_lock_bh(&xprt->sock_lock); + if (req->rq_received) { + err = req->rq_received; + goto out_unlock; + } + if (!__xprt_lock_write(xprt, task)) { err = -EAGAIN; goto out_unlock; @@ -1119,10 +1139,8 @@ goto out_unlock; } - if (list_empty(&req->rq_list)) { + if (list_empty(&req->rq_list)) list_add_tail(&req->rq_list, &xprt->recv); - req->rq_received = 0; - } out_unlock: spin_unlock_bh(&xprt->sock_lock); return err; @@ -1136,7 +1154,6 @@ struct rpc_xprt *xprt = req->rq_xprt; int status, retry = 0; - dprintk("RPC: %4d xprt_transmit(%u)\n", task->tk_pid, req->rq_slen); /* set up everything as needed. */ @@ -1161,8 +1178,10 @@ if (xprt->stream) { req->rq_bytes_sent += status; - if (req->rq_bytes_sent >= req->rq_slen) + if (req->rq_bytes_sent >= req->rq_slen) { + req->rq_bytes_sent = 0; goto out_receive; + } } else { if (status >= req->rq_slen) goto out_receive; @@ -1183,9 +1202,6 @@ * hence there is no danger of the waking up task being put on * schedq, and being picked up by a parallel run of rpciod(). */ - if (req->rq_received) - goto out_release; - task->tk_status = status; switch (status) { @@ -1215,13 +1231,12 @@ if (xprt->stream) xprt_disconnect(xprt); } - out_release: xprt_release_write(xprt, task); - req->rq_bytes_sent = 0; return; out_receive: dprintk("RPC: %4d xmit complete\n", task->tk_pid); /* Set the task's receive timeout value */ + spin_lock_bh(&xprt->sock_lock); if (!xprt->nocong) { task->tk_timeout = rpc_calc_rto(&clnt->cl_rtt, task->tk_msg.rpc_proc->p_timer); @@ -1230,7 +1245,6 @@ task->tk_timeout = req->rq_timeout.to_maxval; } else task->tk_timeout = req->rq_timeout.to_current; - spin_lock_bh(&xprt->sock_lock); /* Don't race with disconnect */ if (!xprt_connected(xprt)) task->tk_status = -ENOTCONN; @@ -1238,7 +1252,6 @@ rpc_sleep_on(&xprt->pending, task, NULL, xprt_timer); __xprt_release_write(xprt, task); spin_unlock_bh(&xprt->sock_lock); - req->rq_bytes_sent = 0; } /* [-- Attachment #3: patch 2/2 --] [-- Type: text/plain, Size: 6592 bytes --] diff -u --recursive --new-file linux-2.5.74-14-soft/include/linux/sunrpc/clnt.h linux-2.5.74-15-mem/include/linux/sunrpc/clnt.h --- linux-2.5.74-14-soft/include/linux/sunrpc/clnt.h 2003-02-13 13:57:46.000000000 +0100 +++ linux-2.5.74-15-mem/include/linux/sunrpc/clnt.h 2003-07-11 00:48:39.000000000 +0200 @@ -35,6 +35,7 @@ */ struct rpc_clnt { atomic_t cl_users; /* number of references */ + atomic_t cl_active; /* number of active calls */ struct rpc_xprt * cl_xprt; /* transport */ struct rpc_procinfo * cl_procinfo; /* procedure info */ u32 cl_maxproc; /* max procedure number */ @@ -57,6 +58,7 @@ struct rpc_portmap cl_pmap; /* port mapping */ struct rpc_wait_queue cl_bindwait; /* waiting on getport() */ + wait_queue_head_t cl_waitq; /* wait queue */ int cl_nodelen; /* nodename length */ char cl_nodename[UNX_MAXNODENAME]; @@ -124,6 +126,15 @@ void rpc_clnt_sigmask(struct rpc_clnt *clnt, sigset_t *oldset); void rpc_clnt_sigunmask(struct rpc_clnt *clnt, sigset_t *oldset); void rpc_setbufsize(struct rpc_clnt *, unsigned int, unsigned int); +int rpc_congestion_wait(struct rpc_clnt *); + +static inline void rpc_mark_active(struct rpc_task *task) +{ + struct rpc_clnt *clnt = task->tk_client; + task->tk_active = 1; + if (clnt) + atomic_inc(&clnt->cl_active); +} static __inline__ int rpc_call(struct rpc_clnt *clnt, u32 proc, void *argp, void *resp, int flags) diff -u --recursive --new-file linux-2.5.74-14-soft/net/sunrpc/clnt.c linux-2.5.74-15-mem/net/sunrpc/clnt.c --- linux-2.5.74-14-soft/net/sunrpc/clnt.c 2003-07-10 21:46:14.000000000 +0200 +++ linux-2.5.74-15-mem/net/sunrpc/clnt.c 2003-07-11 00:49:31.000000000 +0200 @@ -127,6 +127,7 @@ clnt->cl_prot = xprt->prot; clnt->cl_stats = program->stats; INIT_RPC_WAITQ(&clnt->cl_bindwait, "bindwait"); + init_waitqueue_head(&clnt->cl_waitq); if (!clnt->cl_port) clnt->cl_autobind = 1; @@ -389,6 +390,37 @@ } /* + * Throttle the number of active RPC requests + */ +int +rpc_congestion_wait(struct rpc_clnt *clnt) +{ + int ret = 0; + DECLARE_WAITQUEUE(wait, current); + + if (atomic_read(&clnt->cl_active) < RPC_MAXCONG) + goto out; + add_wait_queue(&clnt->cl_waitq, &wait); + for (;;) { + if (clnt->cl_intr) + set_current_state(TASK_INTERRUPTIBLE); + else + set_current_state(TASK_UNINTERRUPTIBLE); + if (atomic_read(&clnt->cl_active) < RPC_MAXCONG) + break; + if (clnt->cl_intr && signal_pending(current)) { + ret = -ERESTARTSYS; + break; + } + io_schedule(); + } + current->state = TASK_RUNNING; + remove_wait_queue(&clnt->cl_waitq, &wait); +out: + return ret; +} + +/* * Restart an (async) RPC call. Usually called from within the * exit handler. */ diff -u --recursive --new-file linux-2.5.74-14-soft/net/sunrpc/sched.c linux-2.5.74-15-mem/net/sunrpc/sched.c --- linux-2.5.74-14-soft/net/sunrpc/sched.c 2003-06-20 22:16:26.000000000 +0200 +++ linux-2.5.74-15-mem/net/sunrpc/sched.c 2003-07-11 00:47:23.000000000 +0200 @@ -257,13 +257,11 @@ return; } rpc_clear_sleeping(task); - if (waitqueue_active(&rpciod_idle)) - wake_up(&rpciod_idle); + wake_up(&rpciod_idle); } } else { rpc_clear_sleeping(task); - if (waitqueue_active(&task->tk_wait)) - wake_up(&task->tk_wait); + wake_up(&task->tk_wait); } } @@ -276,7 +274,7 @@ /* Don't run a child twice! */ if (RPC_IS_ACTIVATED(task)) return; - task->tk_active = 1; + rpc_mark_active(task); rpc_set_sleeping(task); rpc_make_runnable(task); } @@ -289,8 +287,7 @@ { if(rpciod_pid==0) printk(KERN_ERR "rpciod: wot no daemon?\n"); - if (waitqueue_active(&rpciod_idle)) - wake_up(&rpciod_idle); + wake_up(&rpciod_idle); } /* @@ -315,7 +312,7 @@ /* Mark the task as being activated if so needed */ if (!RPC_IS_ACTIVATED(task)) { - task->tk_active = 1; + rpc_mark_active(task); rpc_set_sleeping(task); } @@ -488,7 +485,8 @@ static int __rpc_execute(struct rpc_task *task) { - int status = 0; + int interruptible = task->tk_client->cl_intr; + int status = 0; dprintk("RPC: %4d rpc_execute flgs %x\n", task->tk_pid, task->tk_flags); @@ -547,14 +545,24 @@ } spin_unlock_bh(&rpc_queue_lock); - while (RPC_IS_SLEEPING(task)) { + if (RPC_IS_SLEEPING(task)) { + DEFINE_WAIT(wait); + /* sync task: sleep here */ dprintk("RPC: %4d sync task going to sleep\n", task->tk_pid); if (current->pid == rpciod_pid) printk(KERN_ERR "RPC: rpciod waiting on sync task!\n"); - __wait_event(task->tk_wait, !RPC_IS_SLEEPING(task)); + prepare_to_wait(&task->tk_wait, &wait, + interruptible ? TASK_INTERRUPTIBLE : + TASK_UNINTERRUPTIBLE); + if (likely(RPC_IS_SLEEPING(task))) { + if (likely(!(signalled() && interruptible))) + io_schedule(); + } + finish_wait(&task->tk_wait, &wait); + dprintk("RPC: %4d sync task resuming\n", task->tk_pid); /* @@ -563,7 +571,7 @@ * clean up after sleeping on some queue, we don't * break the loop here, but go around once more. */ - if (task->tk_client->cl_intr && signalled()) { + if (unlikely(signalled() && interruptible)) { dprintk("RPC: %4d got signal\n", task->tk_pid); task->tk_flags |= RPC_TASK_KILLED; rpc_exit(task, -ERESTARTSYS); @@ -620,7 +628,12 @@ goto out_err; } - task->tk_active = 1; + if (task->tk_client) { + status = rpc_congestion_wait(task->tk_client); + if (status < 0) + goto out_release; + } + rpc_mark_active(task); rpc_set_running(task); return __rpc_execute(task); out_release: @@ -818,8 +831,6 @@ /* Remove from any wait queue we're still on */ __rpc_remove_wait_queue(task); - task->tk_active = 0; - spin_unlock_bh(&rpc_queue_lock); /* Synchronously delete any running timer */ @@ -832,6 +843,10 @@ rpcauth_unbindcred(task); rpc_free(task); if (task->tk_client) { + if (task->tk_active) { + atomic_dec(&task->tk_client->cl_active); + wake_up(&task->tk_client->cl_waitq); + } rpc_release_client(task->tk_client); task->tk_client = NULL; } @@ -979,8 +994,20 @@ } if (!rpciod_task_pending()) { + DEFINE_WAIT(wait); + dprintk("RPC: rpciod back to sleep\n"); - wait_event_interruptible(rpciod_idle, rpciod_task_pending()); + + prepare_to_wait(&rpciod_idle, &wait, TASK_INTERRUPTIBLE); + spin_lock_bh(&rpc_queue_lock); + if (likely(!rpciod_task_pending())) { + spin_unlock_bh(&rpc_queue_lock); + if (likely(!signalled())) + schedule(); + } else + spin_unlock_bh(&rpc_queue_lock); + finish_wait(&rpciod_idle, &wait); + dprintk("RPC: switch to rpciod\n"); rounds = 0; } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS client errors with 2.5.74? 2003-07-10 23:36 ` Trond Myklebust @ 2003-07-12 15:13 ` Jamie Lokier 2003-07-13 13:23 ` Jamie Lokier 0 siblings, 1 reply; 12+ messages in thread From: Jamie Lokier @ 2003-07-12 15:13 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-kernel Trond Myklebust wrote: > The first one should fix the problem of the kernel missing replies > while we are busy trying to resend a request. This by itself doesn't fix the problem of too-fast timeout errors on soft mounts (e.g. returning EIO within <0.1s). I am still seeing the fs get into a state where each time a large file is written, it reports EIO (but writes successfully anyway). And "ls -R" still shows EIO errors also. > The second, solves a problem of resource starvation. The fact that we > can currently just submit arbitrary numbers of asynchronous requests > means that we can exhaust resources to the point where the socket > starts dropping replies. > This patch limits the number of outstanding asynchronous requests to > 16 per socket (the maximum number of xprt/transport slots). I haven't tried this yet. It doesn't apply to 2.5.74 due to the calls to io_schedule(). - Jamie ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS client errors with 2.5.74? 2003-07-12 15:13 ` Jamie Lokier @ 2003-07-13 13:23 ` Jamie Lokier 0 siblings, 0 replies; 12+ messages in thread From: Jamie Lokier @ 2003-07-13 13:23 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-kernel Jamie Lokier wrote: > I haven't tried this yet. It doesn't apply to 2.5.74 due to the calls > to io_schedule(). I've tried both patches together now, on 2.5.75. Still seeing fast (<0.1s) timeouts with "soft". Using "hard" seems fairly reliable, but that was the same without the patches. -- Jamie ^ permalink raw reply [flat|nested] 12+ messages in thread
* NFS client errors with 2.5.74? @ 2003-07-10 10:30 Michael Frank 0 siblings, 0 replies; 12+ messages in thread From: Michael Frank @ 2003-07-10 10:30 UTC (permalink / raw) To: jamie; +Cc: linux-kernel I chewed on that a while ago and was advised not to use "soft" mounts. My 2.4<>2.4 setup worked with "soft" for ages, but it broke when running 2.5.6x/7x on the server. If you are using soft mounts, use "hard,intr" instead. Regards Michael -- Powered by linux-2.5.74-mm3. Compiled with gcc-2.95-3 - mature and rock solid My current linux related activities: - 2.5 yenta_socket testing - Test development and testing of swsusp for 2.4/2.5 and ACPI S3 of 2.5 kernel - Everyday usage of 2.5 kernel More info on 2.5 kernel: http://www.codemonkey.org.uk/post-halloween-2.5.txt More info on swsusp: http://sourceforge.net/projects/swsusp/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* NFS client errors with 2.5.74? @ 2003-07-10 5:41 Jamie Lokier 2003-07-10 6:07 ` Jamie Lokier 2003-07-10 9:25 ` Felipe Alfaro Solana 0 siblings, 2 replies; 12+ messages in thread From: Jamie Lokier @ 2003-07-10 5:41 UTC (permalink / raw) To: linux-kernel, trond.myklebust I'm seeing quite a lot of NFS client errors with 2.5.74, connected to a server running 2.4.20-18.9 (Red Hat 9's current kernel). All of the errors that I've observed the form of a write() or close() returning EIO. rsync seeems to have a particularly tough time - could the unusual size of blocks which rsync writes be relevant? There are some read errors too, as Mozilla failed to find my profile claiming it couldn't read the file (when I restarted Mozilla, it found it the second time), and Gnome Terminal was unable to read its preferences file, but I didn't catch any specific read() errors. I tried this command to see if the is a protocol error while running Ethereal: [jamie@dual jamie]$ cp .mirMail.bjl1/INBOX .mirMail.bjl1/JBOX cp: closing `.mirMail.bjl1/JBOX': Input/output error Ethereal shows a long series of NFS WRITEs followed by a single COMMIT (as expected), and nothing after that. All of those replies had status 0, OK - including the reply to the final COMMIT. Using "md5sum" I checked on both the client and server, and the contents of "JBOX" have been written correctly despite the "Input/output error" from "cp". When I looked at the packets from rsync commands that I saw getting EIO from write(), I saw that all requests except LOOKUP returned with status 0 as well, and the LOOKUPs that failed were simply checking whether a ".nfs..." name exists before renaming another file to that name. So as far as I can tell, there is no problem with the packets being sent and received. There is, however, a big problem with error reporting on the client side. A few extra notes: - Running NFS v3. - Kernel is vanilla 2.5.74, dual Athlon 1800MP, 768MB RAM - Chipset: AMD-760MP/AMD-768. Board: Asus A7M266-D. - I _think_ I noticed this problem once when running 2.4.20-18.9 on the client, but generally it is not a problem. I have been mounting my home directory over NFS for weeks with 2.4.20-18.9 as client, so I would have noticed networking or server problems. - Every so often, the client's kernel log gets: kernel: nfs: server 192.168.1.1 not responding, timed out I haven't caught one of those with Ethereal to find out what's going on the wire. There are no other regular kernel messages. Note! These are all logged at different times to the EIOs, differet as in minutes away. The EIOs, btw, are reported immediately; there is no pause waiting for a response from the server. Any idea about this? Is it a known problem? Cheers, -- Jamie ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS client errors with 2.5.74? 2003-07-10 5:41 Jamie Lokier @ 2003-07-10 6:07 ` Jamie Lokier 2003-07-10 10:15 ` Trond Myklebust 2003-07-10 9:25 ` Felipe Alfaro Solana 1 sibling, 1 reply; 12+ messages in thread From: Jamie Lokier @ 2003-07-10 6:07 UTC (permalink / raw) To: linux-kernel, trond.myklebust Some more information: The kernel messages "kernel: nfs: server 192.168.1.1 not responding, timed out" do have some relationship with the EIO errors after all. When I "ls" a directory for the first time (i.e. it's not in cache), I get an EIO _every time_. It's the getdents64() call which returns EIO. The second and subsequent times I list that directory, the listing is fine. However if I pick another directory which isn't in cache yet, getdents64() returns EIO. A packet trace shows something interesting: duplicate requests. In this case, I see four (4) READDIRPLUS requests with identical XIDs. Ethereal says that all four are sent in 0.04 seconds. Then I see four replies, of course with identical XIDs too. The replies all have status OK. But four duplicate requests is mighty suspicious. -- Jamie ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS client errors with 2.5.74? 2003-07-10 6:07 ` Jamie Lokier @ 2003-07-10 10:15 ` Trond Myklebust 2003-07-10 16:14 ` Frank Cusack 0 siblings, 1 reply; 12+ messages in thread From: Trond Myklebust @ 2003-07-10 10:15 UTC (permalink / raw) To: linux-kernel >>>>> " " == Jamie Lokier <jamie@shareable.org> writes: > - Every so often, the client's kernel log gets: > kernel: nfs: server 192.168.1.1 not responding, timed out Sigh... I hate soft mounts... Have I said that before? 8-) > Any idea about this? Is it a known problem? I can never guarantee you perfect service with soft mounts (a 5 second network patition/server congestion is all it takes) but I do have a patch that just went into 2.4.22 that backs out some of the Van Jacobson exponential backoff changes. This helps stabilize things a lot. I haven't yet had time to port that patch to 2.5.x, but the code should be pretty much identical, so if you want to give it a go, then here it is... Cheers, Trond diff -u --recursive --new-file linux-2.4.21-20-rdplus/net/sunrpc/clnt.c linux-2.4.21-21-soft/net/sunrpc/clnt.c --- linux-2.4.21-20-rdplus/net/sunrpc/clnt.c 2003-06-15 14:49:35.000000000 -0700 +++ linux-2.4.21-21-soft/net/sunrpc/clnt.c 2003-06-17 10:38:56.000000000 -0700 @@ -709,14 +709,14 @@ dprintk("RPC: %4d call_timeout (major)\n", task->tk_pid); if (clnt->cl_softrtry) { - if (clnt->cl_chatty && !task->tk_exit) + if (clnt->cl_chatty) printk(KERN_NOTICE "%s: server %s not responding, timed out\n", clnt->cl_protname, clnt->cl_server); rpc_exit(task, -EIO); return; } - if (clnt->cl_chatty && !(task->tk_flags & RPC_CALL_MAJORSEEN) && rpc_ntimeo(&clnt->cl_rtt) > 7) { + if (clnt->cl_chatty && !(task->tk_flags & RPC_CALL_MAJORSEEN)) { task->tk_flags |= RPC_CALL_MAJORSEEN; printk(KERN_NOTICE "%s: server %s not responding, still trying\n", clnt->cl_protname, clnt->cl_server); diff -u --recursive --new-file linux-2.4.21-20-rdplus/net/sunrpc/xprt.c linux-2.4.21-21-soft/net/sunrpc/xprt.c --- linux-2.4.21-20-rdplus/net/sunrpc/xprt.c 2003-06-15 15:14:14.000000000 -0700 +++ linux-2.4.21-21-soft/net/sunrpc/xprt.c 2003-06-17 09:48:18.000000000 -0700 @@ -1046,21 +1046,6 @@ } /* - * Exponential backoff for UDP retries - */ -static inline int -xprt_expbackoff(struct rpc_task *task, struct rpc_rqst *req) -{ - int backoff; - - req->rq_ntimeo++; - backoff = min(rpc_ntimeo(&task->tk_client->cl_rtt), XPRT_MAX_BACKOFF); - if (req->rq_ntimeo < (1 << backoff)) - return 1; - return 0; -} - -/* * RPC receive timeout handler. */ static void @@ -1073,14 +1058,7 @@ if (req->rq_received) goto out; - if (!xprt->nocong) { - if (xprt_expbackoff(task, req)) { - rpc_add_timer(task, xprt_timer); - goto out_unlock; - } - rpc_inc_timeo(&task->tk_client->cl_rtt); - xprt_adjust_cwnd(req->rq_xprt, -ETIMEDOUT); - } + xprt_adjust_cwnd(req->rq_xprt, -ETIMEDOUT); req->rq_nresend++; dprintk("RPC: %4d xprt_timer (%s request)\n", @@ -1090,7 +1068,6 @@ out: task->tk_timeout = 0; rpc_wake_up_task(task); -out_unlock: spin_unlock(&xprt->sock_lock); } @@ -1228,16 +1205,17 @@ return; out_receive: dprintk("RPC: %4d xmit complete\n", task->tk_pid); + spin_lock_bh(&xprt->sock_lock); /* Set the task's receive timeout value */ if (!xprt->nocong) { task->tk_timeout = rpc_calc_rto(&clnt->cl_rtt, rpcproc_timer(clnt, task->tk_msg.rpc_proc)); - req->rq_ntimeo = 0; + task->tk_timeout <<= clnt->cl_timeout.to_retries + - req->rq_timeout.to_retries; if (task->tk_timeout > req->rq_timeout.to_maxval) task->tk_timeout = req->rq_timeout.to_maxval; } else task->tk_timeout = req->rq_timeout.to_current; - spin_lock_bh(&xprt->sock_lock); /* Don't race with disconnect */ if (!xprt_connected(xprt)) task->tk_status = -ENOTCONN; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS client errors with 2.5.74? 2003-07-10 10:15 ` Trond Myklebust @ 2003-07-10 16:14 ` Frank Cusack 0 siblings, 0 replies; 12+ messages in thread From: Frank Cusack @ 2003-07-10 16:14 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-kernel On Thu, Jul 10, 2003 at 12:15:24PM +0200, Trond Myklebust wrote: > >>>>> " " == Jamie Lokier <jamie@shareable.org> writes: > > - Every so often, the client's kernel log gets: > > kernel: nfs: server 192.168.1.1 not responding, timed out > > Sigh... I hate soft mounts... Have I said that before? 8-) Is "timed out" the key that this is a soft mount? Trond, I would suggest that whenever you log this, you also log a message "... do not report a bug, use a hard mount" or some such. :-) /fc ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS client errors with 2.5.74? 2003-07-10 5:41 Jamie Lokier 2003-07-10 6:07 ` Jamie Lokier @ 2003-07-10 9:25 ` Felipe Alfaro Solana 1 sibling, 0 replies; 12+ messages in thread From: Felipe Alfaro Solana @ 2003-07-10 9:25 UTC (permalink / raw) To: Jamie Lokier; +Cc: LKML, trond.myklebust On Thu, 2003-07-10 at 07:41, Jamie Lokier wrote: > I'm seeing quite a lot of NFS client errors with 2.5.74, connected to > a server running 2.4.20-18.9 (Red Hat 9's current kernel). > > All of the errors that I've observed the form of a write() or close() > returning EIO. rsync seeems to have a particularly tough time - > could the unusual size of blocks which rsync writes be relevant? > > There are some read errors too, as Mozilla failed to find my profile > claiming it couldn't read the file (when I restarted Mozilla, it found > it the second time), and Gnome Terminal was unable to read its > preferences file, but I didn't catch any specific read() errors. > > I tried this command to see if the is a protocol error while running > Ethereal: > > [jamie@dual jamie]$ cp .mirMail.bjl1/INBOX .mirMail.bjl1/JBOX > cp: closing `.mirMail.bjl1/JBOX': Input/output error Any chance you are using "hard" NFS mounts? ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2003-07-13 13:08 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20030710053944.GA27038@mail.jlokier.co.uk> [not found] ` <16141.15245.367725.364913@charged.uio.no> 2003-07-10 15:04 ` NFS client errors with 2.5.74? Jamie Lokier [not found] ` <20030710150012.GA29113@mail.jlokier.co.uk> [not found] ` <16141.32852.39625.891724@charged.uio.no> 2003-07-10 15:35 ` Jamie Lokier 2003-07-10 15:45 ` Trond Myklebust 2003-07-10 23:36 ` Trond Myklebust 2003-07-12 15:13 ` Jamie Lokier 2003-07-13 13:23 ` Jamie Lokier 2003-07-10 10:30 Michael Frank -- strict thread matches above, loose matches on Subject: below -- 2003-07-10 5:41 Jamie Lokier 2003-07-10 6:07 ` Jamie Lokier 2003-07-10 10:15 ` Trond Myklebust 2003-07-10 16:14 ` Frank Cusack 2003-07-10 9:25 ` Felipe Alfaro Solana
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).