linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* handling NFSERR_JUKEBOX
@ 2001-07-10  0:23 Bob Smart
  2001-07-13 14:06 ` Pete Wyckoff
  0 siblings, 1 reply; 10+ messages in thread
From: Bob Smart @ 2001-07-10  0:23 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-kernel, jeroen, smart

The latest version of unicos has started to return NFSERR_JUKEBOX
when it has to retrieve a file from tape. I found the following
relevant discussion on the netbsd list:
http://mail-index.netbsd.org/tech-kern/1999/03/16/0002.html
To save you looking the key point is:

  >Not sure. Is there a way that the server can say, "I got your request,
  >but I'm too busy now, try again in a little bit." ??

  Isn't this what NFSERR_JUKEBOX is for?

  AFAIK, the protocol goes something like this:

  Client sends a request.
  Server starts loading tape/optical disk/whatever.
  Client resends request.
  Server notices that this is a repeat of an earlier request which is already
  in the "slow queue", and replies NFSERR_JUKEBOX (= "be patient, I'll send
  the response eventually").
  Client shuts up and waits.
  Server completes request and sends response to client.

It seems that what the linux client is doing is returning error 528
to the user program (cp is giving this error message). From 
linux/errno.h:

#define EJUKEBOX  528     /* Request initiated, but will not complete 
                             before timeout */

This is wrong because the nfs file system is hard mounted in my case
- there is no timeout.

While it would be nice to do a perfect solution, it looks like
a quick fix is to just ignore NFSERR_JUKEBOX from the server.

Bob Smart

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: handling NFSERR_JUKEBOX
  2001-07-10  0:23 handling NFSERR_JUKEBOX Bob Smart
@ 2001-07-13 14:06 ` Pete Wyckoff
  2001-11-06  5:21   ` Red Hat needs this patch (was Re: handling NFSERR_JUKEBOX) Bob Smart
  0 siblings, 1 reply; 10+ messages in thread
From: Pete Wyckoff @ 2001-07-13 14:06 UTC (permalink / raw)
  To: Bob Smart; +Cc: trond.myklebust, linux-kernel, jeroen, smart

smart@hpc.CSIRO.AU said:
> http://mail-index.netbsd.org/tech-kern/1999/03/16/0002.html
> To save you looking the key point is:
> 
>   >Not sure. Is there a way that the server can say, "I got your request,
>   >but I'm too busy now, try again in a little bit." ??
> 
>   Isn't this what NFSERR_JUKEBOX is for?
> 
>   AFAIK, the protocol goes something like this:
> 
>   Client sends a request.
>   Server starts loading tape/optical disk/whatever.
>   Client resends request.
>   Server notices that this is a repeat of an earlier request which is already
>   in the "slow queue", and replies NFSERR_JUKEBOX (= "be patient, I'll send
>   the response eventually").
>   Client shuts up and waits.
>   Server completes request and sends response to client.
> 
> It seems that what the linux client is doing is returning error 528
> to the user program (cp is giving this error message). From 
> linux/errno.h:
> 
> #define EJUKEBOX  528     /* Request initiated, but will not complete 
>                              before timeout */
> 
> This is wrong because the nfs file system is hard mounted in my case
> - there is no timeout.
> 
> While it would be nice to do a perfect solution, it looks like
> a quick fix is to just ignore NFSERR_JUKEBOX from the server.

The comment is incorrect.  Take a look at the RFC for NFSv3.  There is
no duplicate request sent from the client for a JUKEBOX-supporting
server.  The first request is satisfied by the return of EJUKEBOX, after
which it is up to the client to re-request the file.  Ignoring it is not
an option because the request is terminated; a retry does not make
sense.  A new request must be initiated by the client later.

Take a look at the nfs sourceforge mailing list.  There's a long thread
about this in the context of a Solaris HSM server.  I wrote a patch to
try to do the right thing on a linux client, but it is not perfect for
many reasons: http://devrandom.net/lists/archives/2001/6/NFS/0135.html

		-- Pete

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Red Hat needs this patch (was Re: handling NFSERR_JUKEBOX)
  2001-07-13 14:06 ` Pete Wyckoff
@ 2001-11-06  5:21   ` Bob Smart
  2001-11-06  9:05     ` Trond Myklebust
  0 siblings, 1 reply; 10+ messages in thread
From: Bob Smart @ 2001-11-06  5:21 UTC (permalink / raw)
  To: Pete Wyckoff; +Cc: trond.myklebust, linux-kernel, alan

> Take a look at the nfs sourceforge mailing list.  There's a long thread
> about this in the context of a Solaris HSM server.  I wrote a patch to
> try to do the right thing on a linux client, but it is not perfect for
> many reasons: http://devrandom.net/lists/archives/2001/6/NFS/0135.html

Thanks again for this patch. I've appended our current version to
save others the trouble of minor modifications. I've just tested
it against 2.4.13-ac7. 

Your patch may not be perfect but it is a million times better than
nothing. We have had no problems with it. Linux has a long tradition 
of utilizing imperfect software till something better comes along.

This patch is ESSENTIAL to make Linux useable in large enterprises,
because they increasingly commonly utilize hirearchically managed
storage. I strongly urge Alan/Linus/Macello to include this patch
in the stable kernel tree (but marked experimental). Or at least
can the kernel maintainers at Red Hat and other distributions put
it in their offerings: surely you see that you need this for your
big customers.

I understand that some people think the problem should be addressed 
in glibc (+ all other places that do open system calls). Well really
this is just a small part of a much wider problem. I've included a
rant below. The bottom line is that default "wait as long as necessary"
behaviour should be provided by the kernel, but there needs to be an
optional mechanism that lets the user know when things will take longer
than expected.

Cheers,

Bob

-----------------------------------------------------------------------------

Handling long latency events
----------------------------

The linux kernel potentially returns the error EJUKEBOX from
some operations. This error means: "This operation might well
succeed eventually, and I've started the task [e.g. get
file from tape] that will cause it to later succeed. However
since it will take a long time I'm returning this error so
that you can: inform the user; wait a while; then retry the op."
Now that NFS servers are starting to use NFSERR_JUKEBOX returns,
(which get turned into EJUKEBOX if you don't have the Wyckoff
patch) this is starting to happen and we find that linux user
mode software doesn't handle it.

Getting a file from tape is just one of many circumstances where
operations sometimes take an unexpectedly long time. Other cases
typcially revolve around network operations where remote servers
are not responding for one reason or another. As use of network
services increases this problem is getting worse.

We need to be able to handle these situations in 2 different ways.
If it is a batch or background operation then most of the time
we just want it to wait. Typical example: a telnet server shouldn't
drop a terminal session just because the client has been out of
touch for an hour because of network problems. On the other hand
if the software is interacting with the user and suddenly there
are delays, then we want to give the user useful feedback and
also allow the user to kill the operation. A typical example is
a telnet client [and the (old?) windows behaviour of locking up
is particularly annoying].

It seems to me that the right unixy solution to this problem is
through signals. The EJUKEBOX idea is regretable. However it has
happened and we need to handle it. I strongly urge that:

 - default behaviour is for the kernel to wait and never return 
   EJUKEBOX.

 - however user programs that want to handle EJUKEBOX should
   be able to put themselves into a mode where EJUKEBOX is returned.

This approach has the advantage of minimizing impact on existing
user mode software. If you accept my argument then Pete Wyckoff's
patch is a step in the right direction.

-----------------------------------------------------------------------------

diff -r -u --exclude='*.orig' SAVE/linux-2.4.13-0.3/Documentation/Configure.help linux-2.4.13-0.3+jukebox/Documentation/Configure.help
--- SAVE/linux-2.4.13-0.3/Documentation/Configure.help	Tue Oct 30 09:19:01 2001
+++ linux-2.4.13-0.3+jukebox/Documentation/Configure.help	Sun Nov  4 14:03:59 2001
@@ -14094,6 +14094,23 @@
 
   If unsure, say N.
 
+Auto-retry jukebox error requests (EXPERIMENTAL)
+CONFIG_NFS_V3_JUKEBOX_RETRY
+  Say Y here if you want your NFS client to automatically retry requests
+  to which the server responded with EJUKEBOX (528).  Some NFS servers
+  export filesystems which are hierarchical, in that some files may take
+  much longer than others to read or write.  These files may be migrated
+  to a slow tape storage system which serves as the bulk capacity for the
+  file system; the fast disk becomes essentially a cache.
+
+  If you try to read a file which has been migrated to tape, the server
+  may return the error EJUKEBOX and begin the process which brings the
+  file from tape to disk.  This option enables the kernel to retry
+  the access until the file is ready, rather than passing the error to
+  userspace applications, which is the default.
+ 
+  If unsure, say N.
+
 Root file system on NFS
 CONFIG_ROOT_NFS
   If you want your Linux box to mount its whole root file system (the
diff -r -u --exclude='*.orig' SAVE/linux-2.4.13-0.3/fs/Config.in linux-2.4.13-0.3+jukebox/fs/Config.in
--- SAVE/linux-2.4.13-0.3/fs/Config.in	Tue Oct 30 09:18:42 2001
+++ linux-2.4.13-0.3+jukebox/fs/Config.in	Sun Nov  4 14:03:59 2001
@@ -95,6 +95,7 @@
    dep_tristate 'InterMezzo file system support (experimental, replicating fs)' CONFIG_INTERMEZZO_FS $CONFIG_INET $CONFIG_EXPERIMENTAL
    dep_tristate 'NFS file system support' CONFIG_NFS_FS $CONFIG_INET
    dep_mbool '  Provide NFSv3 client support' CONFIG_NFS_V3 $CONFIG_NFS_FS
+   dep_mbool '    Auto-retry jukebox error requests' CONFIG_NFS_V3_JUKEBOX_RETRY $CONFIG_NFS_V3
    dep_bool '  Root file system on NFS' CONFIG_ROOT_NFS $CONFIG_NFS_FS $CONFIG_IP_PNP
 
    dep_tristate 'NFS server support' CONFIG_NFSD $CONFIG_INET
diff -r -u --exclude='*.orig' SAVE/linux-2.4.13-0.3/fs/nfs/file.c linux-2.4.13-0.3+jukebox/fs/nfs/file.c
--- SAVE/linux-2.4.13-0.3/fs/nfs/file.c	Mon Sep 24 02:48:01 2001
+++ linux-2.4.13-0.3+jukebox/fs/nfs/file.c	Sun Nov  4 15:17:14 2001
@@ -192,8 +192,19 @@
 	if (!inode)
 		return 0;
 
+#ifdef CONFIG_NFS_V3_JUKEBOX_RETRY
+        /* perhaps nfsv3 server returned jukebox, go try again */
+        /* if (PageError(page) && inode->u.nfs_i.async_error == -EJUKEBOX) {
+                result = nfs_readpage_sync(inode->u.nfs_i.file, inode, page); */
+        if (PageError(page) && NFS_I(inode)->async_error == -EJUKEBOX) {
+                result = nfs_readpage_sync(NFS_I(inode)->file, inode, page);
+        } else
+#endif
+        {
 	rpages = NFS_SERVER(inode)->rpages;
 	result = nfs_pagein_inode(inode, index, rpages);
+        }
+
 	if (result < 0)
 		return result;
 	return 0;
diff -r -u --exclude='*.orig' SAVE/linux-2.4.13-0.3/fs/nfs/nfs3proc.c linux-2.4.13-0.3+jukebox/fs/nfs/nfs3proc.c
--- SAVE/linux-2.4.13-0.3/fs/nfs/nfs3proc.c	Tue Oct  2 06:45:37 2001
+++ linux-2.4.13-0.3+jukebox/fs/nfs/nfs3proc.c	Sun Nov  4 16:28:09 2001
@@ -4,6 +4,8 @@
  *  Client-side NFSv3 procedures stubs.
  *
  *  Copyright (C) 1997, Olaf Kirch
+ *
+ *  21 Jun 01 - Retry NFSv3 EJUKEBOX errors by Pete Wyckoff 
  */
 
 #include <linux/mm.h>
@@ -56,10 +58,25 @@
 	struct nfs3_sattrargs	arg = { NFS_FH(inode), sattr, 0, 0 };
 	int	status;
 
+        for (;;) {
 	dprintk("NFS call  setattr\n");
 	fattr->valid = 0;
-	status = rpc_call(NFS_CLIENT(inode), NFS3PROC_SETATTR, &arg, fattr, 0);
-	dprintk("NFS reply setattr\n");
+                status = rpc_call(NFS_CLIENT(inode), NFS3PROC_SETATTR, &arg,
+                    fattr, 0);
+                dprintk("NFS reply setattr %d\n", status);
+#ifdef CONFIG_NFS_V3_JUKEBOX_RETRY
+                if (status == -EJUKEBOX) {
+                        DECLARE_WAIT_QUEUE_HEAD(jukebox_wq);
+                        tty_write_message(current->tty,
+                          "NFSv3: write waiting for file unmigration\r\n");
+                        if (interruptible_sleep_on_timeout(&jukebox_wq,
+                            NFS_JUKEBOX_RETRY_TIMEO) == 0)
+                                continue;  /* but not if interrupted */
+                }
+#endif
+                break;
+        }
+
 	return status;
 }
 
@@ -147,10 +164,24 @@
 	struct rpc_message	msg = { NFS3PROC_READ, &arg, &res, cred };
 	int			status;
 
+        for (;;) {
 	dprintk("NFS call  read %d @ %Ld\n", count, (long long)offset);
 	fattr->valid = 0;
 	status = rpc_call_sync(NFS_CLIENT(inode), &msg, flags);
 	dprintk("NFS reply read: %d\n", status);
+#ifdef CONFIG_NFS_V3_JUKEBOX_RETRY
+                if (status == -EJUKEBOX) {
+                        DECLARE_WAIT_QUEUE_HEAD(jukebox_wq);
+                        tty_write_message(current->tty,
+                          "NFSv3: read waiting for file unmigration\r\n");
+                        if (interruptible_sleep_on_timeout(&jukebox_wq,
+                            NFS_JUKEBOX_RETRY_TIMEO) == 0)
+                                continue;  /* but not if interrupted */
+                }
+#endif
+                break;
+        }
+
 	*eofp = res.eof;
 	return status;
 }
@@ -177,7 +208,7 @@
 
 	status = rpc_call_sync(NFS_CLIENT(inode), &msg, rpcflags);
 
-	dprintk("NFS reply read: %d\n", status);
+	dprintk("NFS reply write: %d\n", status);
 	return status < 0? status : res.count;
 }
 
diff -r -u --exclude='*.orig' SAVE/linux-2.4.13-0.3/fs/nfs/read.c linux-2.4.13-0.3+jukebox/fs/nfs/read.c
--- SAVE/linux-2.4.13-0.3/fs/nfs/read.c	Tue Oct 30 09:18:42 2001
+++ linux-2.4.13-0.3+jukebox/fs/nfs/read.c	Sun Nov  4 15:28:38 2001
@@ -81,7 +81,7 @@
 /*
  * Read a page synchronously.
  */
-static int
+int
 nfs_readpage_sync(struct file *file, struct inode *inode, struct page *page)
 {
 	struct rpc_cred	*cred = NULL;
@@ -449,11 +449,27 @@
 		if (task->tk_status >= 0 && count >= 0) {
 			SetPageUptodate(page);
 			count -= PAGE_CACHE_SIZE;
-		} else
+			flush_dcache_page(page);
+                        kunmap(page);
+                        UnlockPage(page);
+		} else {
 			SetPageError(page);
 		flush_dcache_page(page);
 		kunmap(page);
+#ifdef CONFIG_NFS_V3_JUKEBOX_RETRY
+                        /* do not unlock page, let nfs_sync_page retry */
+                        if (task->tk_status == -EJUKEBOX) {
+                                NFS_I(inode)->async_error = task->tk_status;
+                                if (waitqueue_active(&page->wait))
+                                        wake_up(&page->wait);
+                                else  /* nobody waiting, just unlock */
 		UnlockPage(page);
+                        } else
+#endif
+                        {
+                                UnlockPage(page);
+                        }
+                }
 
 		dprintk("NFS: read (%x/%Ld %d@%Ld)\n",
                         req->wb_inode->i_dev,
@@ -504,8 +520,13 @@
 		goto out_error;
 
 	error = -1;
-	if (!PageError(page) && NFS_SERVER(inode)->rsize >= PAGE_CACHE_SIZE)
+	if (!PageError(page) && NFS_SERVER(inode)->rsize >= PAGE_CACHE_SIZE) {
+#ifdef CONFIG_NFS_V3_JUKEBOX_RETRY
+                NFS_I(inode)->file = file;
+                NFS_I(inode)->async_error = 0;
+#endif
 		error = nfs_readpage_async(file, inode, page);
+	}
 	if (error >= 0)
 		goto out;
 
diff -r -u --exclude='*.orig' SAVE/linux-2.4.13-0.3/include/linux/nfs_fs.h linux-2.4.13-0.3+jukebox/include/linux/nfs_fs.h
--- SAVE/linux-2.4.13-0.3/include/linux/nfs_fs.h	Tue Oct 30 10:04:16 2001
+++ linux-2.4.13-0.3+jukebox/include/linux/nfs_fs.h	Sun Nov  4 14:09:34 2001
@@ -104,6 +104,9 @@
 /* Inode Flags */
 #define NFS_USE_READDIRPLUS(inode)	((NFS_FLAGS(inode) & NFS_INO_ADVISE_RDPLUS) ? 1 : 0)
 
+/* Jukebox retry interval */
+#define NFS_JUKEBOX_RETRY_TIMEO		(10 * HZ)
+
 /*
  * These are the default flags for swap requests
  */
@@ -258,6 +261,7 @@
 extern int  nfs_readpage(struct file *, struct page *);
 extern int  nfs_pagein_inode(struct inode *, unsigned long, unsigned int);
 extern int  nfs_pagein_timeout(struct inode *);
+extern int  nfs_readpage_sync(struct file *, struct inode *, struct page *);
 
 /*
  * linux/fs/mount_clnt.c
diff -r -u --exclude='*.orig' SAVE/linux-2.4.13-0.3/include/linux/nfs_fs_i.h linux-2.4.13-0.3+jukebox/include/linux/nfs_fs_i.h
--- SAVE/linux-2.4.13-0.3/include/linux/nfs_fs_i.h	Tue Oct 30 10:03:46 2001
+++ linux-2.4.13-0.3+jukebox/include/linux/nfs_fs_i.h	Sun Nov  4 14:09:25 2001
@@ -75,6 +75,11 @@
 
 	/* Credentials for shared mmap */
 	struct rpc_cred		*mm_cred;
+
+#ifdef CONFIG_NFS_V3_JUKEBOX_RETRY
+        struct file *file;
+        int async_error;
+#endif
 };
 
 /*


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Red Hat needs this patch (was Re: handling NFSERR_JUKEBOX)
  2001-11-06  5:21   ` Red Hat needs this patch (was Re: handling NFSERR_JUKEBOX) Bob Smart
@ 2001-11-06  9:05     ` Trond Myklebust
  2001-11-06  9:10       ` Marcelo Tosatti
                         ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Trond Myklebust @ 2001-11-06  9:05 UTC (permalink / raw)
  To: Bob Smart; +Cc: Pete Wyckoff, linux-kernel, alan

>>>>> " " == Bob Smart <smart@hpc.CSIRO.AU> writes:

    >> Take a look at the nfs sourceforge mailing list.  There's a
    >> long thread about this in the context of a Solaris HSM server.
    >> I wrote a patch to try to do the right thing on a linux client,
    >> but it is not perfect for many reasons:
    >> http://devrandom.net/lists/archives/2001/6/NFS/0135.html

     > Thanks again for this patch. I've appended our current version
     > to save others the trouble of minor modifications. I've just
     > tested it against 2.4.13-ac7.

     > Your patch may not be perfect but it is a million times better
     > than nothing. We have had no problems with it. Linux has a long
     > tradition of utilizing imperfect software till something better
     > comes along.

     > This patch is ESSENTIAL to make Linux useable in large
     > enterprises, because they increasingly commonly utilize
     > hirearchically managed storage. I strongly urge
     > Alan/Linus/Macello to include this patch in the stable kernel
     > tree (but marked experimental). Or at least can the kernel
     > maintainers at Red Hat and other distributions put it in their
     > offerings: surely you see that you need this for your big
     > customers.

The patch may appear to work, but it fails to understand the nature of
the problem. The problem with NFSERR_JUKEBOX is that is can appear in
more or less *any* NFSv3 RPC call. That includes everything from
lookups to writes.

If all you do is intercept read and setattr, then you certainly
haven't achieved 'enterprise quality'.

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Red Hat needs this patch (was Re: handling NFSERR_JUKEBOX)
  2001-11-06  9:05     ` Trond Myklebust
@ 2001-11-06  9:10       ` Marcelo Tosatti
  2001-11-06 10:36       ` Bob Smart
  2001-11-06 10:53       ` Red Hat needs this patch (was Re: handling NFSERR_JUKEBOX) Trond Myklebust
  2 siblings, 0 replies; 10+ messages in thread
From: Marcelo Tosatti @ 2001-11-06  9:10 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Bob Smart, Pete Wyckoff, linux-kernel, alan



On 6 Nov 2001, Trond Myklebust wrote:

> >>>>> " " == Bob Smart <smart@hpc.CSIRO.AU> writes:
> 
>     >> Take a look at the nfs sourceforge mailing list.  There's a
>     >> long thread about this in the context of a Solaris HSM server.
>     >> I wrote a patch to try to do the right thing on a linux client,
>     >> but it is not perfect for many reasons:
>     >> http://devrandom.net/lists/archives/2001/6/NFS/0135.html
> 
>      > Thanks again for this patch. I've appended our current version
>      > to save others the trouble of minor modifications. I've just
>      > tested it against 2.4.13-ac7.
> 
>      > Your patch may not be perfect but it is a million times better
>      > than nothing. We have had no problems with it. Linux has a long
>      > tradition of utilizing imperfect software till something better
>      > comes along.
> 
>      > This patch is ESSENTIAL to make Linux useable in large
>      > enterprises, because they increasingly commonly utilize
>      > hirearchically managed storage. I strongly urge
>      > Alan/Linus/Macello to include this patch in the stable kernel
>      > tree (but marked experimental). Or at least can the kernel
>      > maintainers at Red Hat and other distributions put it in their
>      > offerings: surely you see that you need this for your big
>      > customers.
> 
> The patch may appear to work, but it fails to understand the nature of
> the problem. The problem with NFSERR_JUKEBOX is that is can appear in
> more or less *any* NFSv3 RPC call. That includes everything from
> lookups to writes.
> 
> If all you do is intercept read and setattr, then you certainly
> haven't achieved 'enterprise quality'.

Trond, 

Should'nt we make Linux retry on NFSERR_JUKEBOX _by default_ ? 

Is there any reason why we already don't do that already except "nobody
coded it yet" ? 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Red Hat needs this patch (was Re: handling NFSERR_JUKEBOX)
  2001-11-06  9:05     ` Trond Myklebust
  2001-11-06  9:10       ` Marcelo Tosatti
@ 2001-11-06 10:36       ` Bob Smart
  2001-11-06 13:54         ` Trond Myklebust
  2001-11-06 10:53       ` Red Hat needs this patch (was Re: handling NFSERR_JUKEBOX) Trond Myklebust
  2 siblings, 1 reply; 10+ messages in thread
From: Bob Smart @ 2001-11-06 10:36 UTC (permalink / raw)
  To: linux-kernel


> If all you do is intercept read and setattr, then you certainly
> haven't achieved 'enterprise quality'.

I don't have documentation on when Cray's NFSv3 server will return
NFSERR_JUKEBOX, but since inodes aren't migrated to tape I don't
think it is likely on other than read and non-truncate write with
no prior read. Our applications are unlikely to do the latter.

Maybe there is a subset of servers for which the patch is OK, and
we could make that clear in the help for the config option. I'll
ask Cray for clarification about their server.

Bob

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Red Hat needs this patch (was Re: handling NFSERR_JUKEBOX)
  2001-11-06  9:05     ` Trond Myklebust
  2001-11-06  9:10       ` Marcelo Tosatti
  2001-11-06 10:36       ` Bob Smart
@ 2001-11-06 10:53       ` Trond Myklebust
  2 siblings, 0 replies; 10+ messages in thread
From: Trond Myklebust @ 2001-11-06 10:53 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Bob Smart, Pete Wyckoff, linux-kernel, alan

>>>>> " " == Marcelo Tosatti <marcelo@conectiva.com.br> writes:

     > Trond,

     > Should'nt we make Linux retry on NFSERR_JUKEBOX _by default_ ?

Normally we probably should. The thing that worries me is the fact
that we may be holding the VFS semaphores for long periods of
time. In particular that will greatly slow down lookups...
This is the reason why I'd prefer a lot of these things to be done in
the VFS layer or above.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Red Hat needs this patch (was Re: handling NFSERR_JUKEBOX)
  2001-11-06 10:36       ` Bob Smart
@ 2001-11-06 13:54         ` Trond Myklebust
  2001-11-06 23:41           ` handling NFSERR_JUKEBOX Trond Myklebust
  0 siblings, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2001-11-06 13:54 UTC (permalink / raw)
  To: Bob Smart; +Cc: linux-kernel

>>>>> " " == Bob Smart <smart@hpc.CSIRO.AU> writes:

     > Maybe there is a subset of servers for which the patch is OK,
     > and we could make that clear in the help for the config
     > option. I'll ask Cray for clarification about their server.

I'm still not convinced this is a good idea, but if you are going to
do things inside the NFS client, why don't you instead write a wrapper
function around rpc_call_sync() for fs/nfs/nfs3proc.c. Something like

static int
nfs_rpc_wrapper(struct rpc_clnt *clnt, struct rpc_message *msg, int flags)
{
        sigset_t oldset;
        int res;
        rpc_clnt_sigmask(clnt, &oldset);
        do {
                res = rpc_call_sync(clnt, msg, flags);
                if (result != -EJUKEBOX)
                        break;
                set_current_state(TASK_INTERRUPTIBLE);
                schedule_timeout(NFS_JUKEBOX_RETRY_TIME);
                res = -ERESTARTSYS;
        } while (!signalled());
        rpc_clnt_sigunmask(clnt, &oldset);
        return res;
}

and then use a couple of '#define's to wrap rpc_call() and
rpc_call_sync() in nfs3_proc_*(). That will take care of all those
synchronous calls in one fell swoop.

You'll still need to take care of asynchronous reads, writes and
unlink, but those are easy. Just do something like

if (task->tk_error == -EJUKEBOX) {
        rpc_delay(task, NFS_JUKEBOX_RETRY_TIMEO);
        rpc_restart_call(task);
        return;
}

in nfs_readpage_result(), nfs_writeback_done(), nfs_commit_done(), and
nfs_async_unlink_done()...

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: handling NFSERR_JUKEBOX
  2001-11-06 13:54         ` Trond Myklebust
@ 2001-11-06 23:41           ` Trond Myklebust
  2001-11-08 16:20             ` Pete Wyckoff
  0 siblings, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2001-11-06 23:41 UTC (permalink / raw)
  To: Bob Smart; +Cc: linux-kernel, nfs

>>>>> " " == Trond Myklebust <trond.myklebust@fys.uio.no> writes:

     > I'm still not convinced this is a good idea, but if you are
     > going to do things inside the NFS client, why don't you instead
     > write a wrapper function around rpc_call_sync() for
     > fs/nfs/nfs3proc.c. Something like

...and here's the full patch that implements it...

Cheers,
  Trond

diff -u --recursive --new-file linux-2.4.14/fs/nfs/nfs3proc.c linux-2.4.14-jukebox/fs/nfs/nfs3proc.c
--- linux-2.4.14/fs/nfs/nfs3proc.c	Mon Oct  1 22:45:37 2001
+++ linux-2.4.14-jukebox/fs/nfs/nfs3proc.c	Wed Nov  7 00:25:19 2001
@@ -17,6 +17,37 @@
 
 #define NFSDBG_FACILITY		NFSDBG_PROC
 
+/* A wrapper to handle the EJUKEBOX error message */
+static int
+nfs3_rpc_wrapper(struct rpc_clnt *clnt, struct rpc_message *msg, int flags)
+{
+	sigset_t oldset;
+	int res;
+	rpc_clnt_sigmask(clnt, &oldset);
+	do {
+		res = rpc_call_sync(clnt, msg, flags);
+		if (res != -EJUKEBOX)
+			break;
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule_timeout(NFS_JUKEBOX_RETRY_TIME);
+		res = -ERESTARTSYS;
+	} while (!signalled());
+	rpc_clnt_sigunmask(clnt, &oldset);
+	return res;
+}
+
+static inline int
+nfs3_rpc_call_wrapper(struct rpc_clnt *clnt, u32 proc, void *argp, void *resp, int flags)
+{
+	struct rpc_message msg = { proc, argp, resp, NULL };
+	return nfs3_rpc_wrapper(clnt, &msg, flags);
+}
+
+#define rpc_call(clnt, proc, argp, resp, flags) \
+		nfs3_rpc_call_wrapper(clnt, proc, argp, resp, flags)
+#define rpc_call_sync(clnt, msg, flags) \
+		nfs3_rpc_wrapper(clnt, msg, flags)
+
 /*
  * Bare-bones access to getattr: this is for nfs_read_super.
  */
diff -u --recursive --new-file linux-2.4.14/fs/nfs/read.c linux-2.4.14-jukebox/fs/nfs/read.c
--- linux-2.4.14/fs/nfs/read.c	Thu Oct 11 17:12:52 2001
+++ linux-2.4.14-jukebox/fs/nfs/read.c	Wed Nov  7 00:25:19 2001
@@ -437,6 +437,9 @@
 	dprintk("NFS: %4d nfs_readpage_result, (status %d)\n",
 		task->tk_pid, task->tk_status);
 
+	if (nfs_async_handle_jukebox(task))
+		return;
+
 	nfs_refresh_inode(inode, &data->fattr);
 	while (!list_empty(&data->pages)) {
 		struct nfs_page *req = nfs_list_entry(data->pages.next);
diff -u --recursive --new-file linux-2.4.14/fs/nfs/unlink.c linux-2.4.14-jukebox/fs/nfs/unlink.c
--- linux-2.4.14/fs/nfs/unlink.c	Thu Aug 16 18:39:37 2001
+++ linux-2.4.14-jukebox/fs/nfs/unlink.c	Wed Nov  7 00:25:19 2001
@@ -123,6 +123,8 @@
 	struct dentry		*dir = data->dir;
 	struct inode		*dir_i;
 
+	if (nfs_async_handle_jukebox(task))
+		return;
 	if (!dir)
 		return;
 	dir_i = dir->d_inode;
diff -u --recursive --new-file linux-2.4.14/fs/nfs/write.c linux-2.4.14-jukebox/fs/nfs/write.c
--- linux-2.4.14/fs/nfs/write.c	Thu Oct 11 17:12:52 2001
+++ linux-2.4.14-jukebox/fs/nfs/write.c	Wed Nov  7 00:25:19 2001
@@ -1229,6 +1229,9 @@
 	dprintk("NFS: %4d nfs_writeback_done (status %d)\n",
 		task->tk_pid, task->tk_status);
 
+	if (nfs_async_handle_jukebox(task))
+		return;
+
 	/* We can't handle that yet but we check for it nevertheless */
 	if (resp->count < argp->count && task->tk_status >= 0) {
 		static unsigned long    complain;
@@ -1422,6 +1425,9 @@
         dprintk("NFS: %4d nfs_commit_done (status %d)\n",
                                 task->tk_pid, task->tk_status);
 
+	if (nfs_async_handle_jukebox(task))
+		return;
+
 	nfs_write_attributes(inode, resp->fattr);
 	while (!list_empty(&data->pages)) {
 		req = nfs_list_entry(data->pages.next);
diff -u --recursive --new-file linux-2.4.14/include/linux/nfs_fs.h linux-2.4.14-jukebox/include/linux/nfs_fs.h
--- linux-2.4.14/include/linux/nfs_fs.h	Wed Oct 24 07:00:07 2001
+++ linux-2.4.14-jukebox/include/linux/nfs_fs.h	Wed Nov  7 00:25:19 2001
@@ -16,6 +16,7 @@
 
 #include <linux/sunrpc/debug.h>
 #include <linux/sunrpc/auth.h>
+#include <linux/sunrpc/clnt.h>
 
 #include <linux/nfs.h>
 #include <linux/nfs2.h>
@@ -326,6 +327,29 @@
 	__retval;							\
 })
 
+#ifdef CONFIG_NFS_V3
+
+#define NFS_JUKEBOX_RETRY_TIME (5 * HZ)
+static inline int
+nfs_async_handle_jukebox(struct rpc_task *task)
+{
+	if (task->tk_status != -EJUKEBOX)
+		return 0;
+	task->tk_status = 0;
+	rpc_restart_call(task);
+	rpc_delay(task, NFS_JUKEBOX_RETRY_TIME);
+	return 1;
+}
+
+#else
+
+static inline int
+nfs_async_handle_jukebox(struct rpc_task *task)
+{
+	return 0;
+}
+#endif /* CONFIG_NFS_V3 */
+
 #endif /* __KERNEL__ */
 
 /*

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: handling NFSERR_JUKEBOX
  2001-11-06 23:41           ` handling NFSERR_JUKEBOX Trond Myklebust
@ 2001-11-08 16:20             ` Pete Wyckoff
  0 siblings, 0 replies; 10+ messages in thread
From: Pete Wyckoff @ 2001-11-08 16:20 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Bob Smart, linux-kernel, nfs

trond.myklebust@fys.uio.no said:
> > I'm still not convinced this is a good idea, but if you are
> > going to do things inside the NFS client, why don't you instead
> > write a wrapper function around rpc_call_sync() for
> > fs/nfs/nfs3proc.c. Something like
> 
> ...and here's the full patch that implements it...

Works like a charm.  Glad you allowed yourself to be badgered
into writing the proper code to handle EJUKEBOX.  I'll destroy
my icky version.

Can you push it forward to Linus for general inclusion?

		-- Pete

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2001-11-08 16:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-07-10  0:23 handling NFSERR_JUKEBOX Bob Smart
2001-07-13 14:06 ` Pete Wyckoff
2001-11-06  5:21   ` Red Hat needs this patch (was Re: handling NFSERR_JUKEBOX) Bob Smart
2001-11-06  9:05     ` Trond Myklebust
2001-11-06  9:10       ` Marcelo Tosatti
2001-11-06 10:36       ` Bob Smart
2001-11-06 13:54         ` Trond Myklebust
2001-11-06 23:41           ` handling NFSERR_JUKEBOX Trond Myklebust
2001-11-08 16:20             ` Pete Wyckoff
2001-11-06 10:53       ` Red Hat needs this patch (was Re: handling NFSERR_JUKEBOX) Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).