linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PROBLEM: knfsd misses occasional writes
@ 2002-05-15 12:12 Sverker Wiberg
  2002-05-15 12:18 ` Neil Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Sverker Wiberg @ 2002-05-15 12:12 UTC (permalink / raw)
  To: Linux Kernel Mailing List


Hello everyone, 

When copying lots of small files from multiple NFS clients to a kNFSd
filesystem (i.e. doing backup of a cluster), exported with `sync', I
find that some few files (1 out of 1000) were silently truncated to zero
size when checking locally with `ls' (the clients reported total
success). With `asynch' instead, all files were correctly copied. 

I have seen this behaviour in 2.4.17 (UP and SMP builds, UP hardware) as
well as 2.4.18, when using the NFSv2 protocol. I have not tried 2.5.x
and NFSv3 yet. The full /etc/exports line is:

   /opt/telorb 172.16.0.0/255.255.0.0(rw,sync,no_wdelay)

Removing `no_wdelay' makes no difference.

The clients are all 2.4.17, and the relevant .config lines (for both
server and clients) are:

   CONFIG_NFS_FS=y
   CONFIG_NFS_V3=y
   CONFIG_ROOT_NFS=y
   CONFIG_NFSD=y
   CONFIG_NFSD_V3=y
   CONFIG_SUNRPC=y
   CONFIG_LOCKD=y
   CONFIG_LOCKD_V4=y

Reading the source (fs/nfsd/*) seems to show that knfsd tries to do the
right thing.

/Sverker Wiberg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PROBLEM: knfsd misses occasional writes
@ 2002-05-15 12:18 ` Neil Brown
  2002-05-16 10:49   ` Sverker Wiberg
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2002-05-15 12:18 UTC (permalink / raw)
  To: Sverker Wiberg; +Cc: Linux Kernel Mailing List

On Wednesday May 15, Sverker.Wiberg@uab.ericsson.se wrote:
> 
> Hello everyone, 
> 
> When copying lots of small files from multiple NFS clients to a kNFSd
> filesystem (i.e. doing backup of a cluster), exported with `sync', I
> find that some few files (1 out of 1000) were silently truncated to zero
> size when checking locally with `ls' (the clients reported total
> success). With `asynch' instead, all files were correctly copied. 

How are you mounting the file systems on the clients?
The symptoms sound exactly like you are using "soft" mounts.  "soft"
is a very bad mount option.  Use "hard".

If you aren't using "soft", let me know and I will look harder.

NeilBrown

> 
> I have seen this behaviour in 2.4.17 (UP and SMP builds, UP hardware) as
> well as 2.4.18, when using the NFSv2 protocol. I have not tried 2.5.x
> and NFSv3 yet. The full /etc/exports line is:
> 
>    /opt/telorb 172.16.0.0/255.255.0.0(rw,sync,no_wdelay)
> 
> Removing `no_wdelay' makes no difference.
> 
> The clients are all 2.4.17, and the relevant .config lines (for both
> server and clients) are:
> 
>    CONFIG_NFS_FS=y
>    CONFIG_NFS_V3=y
>    CONFIG_ROOT_NFS=y
>    CONFIG_NFSD=y
>    CONFIG_NFSD_V3=y
>    CONFIG_SUNRPC=y
>    CONFIG_LOCKD=y
>    CONFIG_LOCKD_V4=y
> 
> Reading the source (fs/nfsd/*) seems to show that knfsd tries to do the
> right thing.
> 
> /Sverker Wiberg
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PROBLEM: knfsd misses occasional writes
  2002-05-15 12:18 ` Neil Brown
@ 2002-05-16 10:49   ` Sverker Wiberg
  2002-05-16 11:39     ` Neil Brown
  2002-05-16 20:34     ` G Sandine
  0 siblings, 2 replies; 9+ messages in thread
From: Sverker Wiberg @ 2002-05-16 10:49 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux Kernel Mailing List

Neil Brown wrote:
> 
> On Wednesday May 15, Sverker.Wiberg@uab.ericsson.se wrote:
> >
> > Hello everyone,
> >
> > When copying lots of small files from multiple NFS clients to a kNFSd
> > filesystem (i.e. doing backup of a cluster), exported with `sync', I
> > find that some few files (1 out of 1000) were silently truncated to zero
                                                  ^^^^^^^^
                                                  no errors reported 

> > size when checking locally with `ls' (the clients reported total
> > success). With `asynch' instead, all files were correctly copied.
> 
> How are you mounting the file systems on the clients?
> The symptoms sound exactly like you are using "soft" mounts.  "soft"
> is a very bad mount option.  Use "hard".
>
> If you aren't using "soft", let me know and I will look harder.

Errrm, I am using "soft" mounts, as I (we) want the clients to survive
server restarts.
But shouldn't those timeouts become errors over at the clients?

/Sverker

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PROBLEM: knfsd misses occasional writes
@ 2002-05-16 11:39     ` Neil Brown
  2002-05-16 16:48       ` Sverker Wiberg
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2002-05-16 11:39 UTC (permalink / raw)
  To: Sverker Wiberg; +Cc: Linux Kernel Mailing List

On Thursday May 16, Sverker.Wiberg@uab.ericsson.se wrote:
> Neil Brown wrote:
> > 
> > On Wednesday May 15, Sverker.Wiberg@uab.ericsson.se wrote:
> > >
> > > Hello everyone,
> > >
> > > When copying lots of small files from multiple NFS clients to a kNFSd
> > > filesystem (i.e. doing backup of a cluster), exported with `sync', I
> > > find that some few files (1 out of 1000) were silently truncated to zero
>                                                   ^^^^^^^^
>                                                   no errors reported 
> 
> > > size when checking locally with `ls' (the clients reported total
> > > success). With `asynch' instead, all files were correctly copied.
> > 
> > How are you mounting the file systems on the clients?
> > The symptoms sound exactly like you are using "soft" mounts.  "soft"
> > is a very bad mount option.  Use "hard".
> >
> > If you aren't using "soft", let me know and I will look harder.
> 
> Errrm, I am using "soft" mounts, as I (we) want the clients to survive
> server restarts.

What do you mean by "survive"?  What you probably want is
   hard,intr
so that clients will wait for the server to come back, but you can
interrupt processes successfully.

> But shouldn't those timeouts become errors over at the clients?

Yes... but "write" won't see an error.  Only 'fsync' or maybe 'close',
and many applications ignore errors from these operations.

NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PROBLEM: knfsd misses occasional writes
  2002-05-16 11:39     ` Neil Brown
@ 2002-05-16 16:48       ` Sverker Wiberg
  0 siblings, 0 replies; 9+ messages in thread
From: Sverker Wiberg @ 2002-05-16 16:48 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux Kernel Mailing List

Neil Brown wrote:
> 
> On Thursday May 16, Sverker.Wiberg@uab.ericsson.se wrote:

[on soft mount timeouts]
> > But shouldn't those timeouts become errors over at the clients?
> 
> Yes... but "write" won't see an error.  Only 'fsync' or maybe 'close',
> and many applications ignore errors from these operations.

How come? Isn't the client side innately synchronous (as RPC clients in
general)?
Or is this one of thost thing that are now done differently?

/Sverker

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PROBLEM: knfsd misses occasional writes
  2002-05-16 10:49   ` Sverker Wiberg
  2002-05-16 11:39     ` Neil Brown
@ 2002-05-16 20:34     ` G Sandine
  2002-05-17 10:38       ` Sverker Wiberg
  1 sibling, 1 reply; 9+ messages in thread
From: G Sandine @ 2002-05-16 20:34 UTC (permalink / raw)
  To: linux-kernel

On Thu, May 16, 2002 at 12:49:01PM +0200, Sverker Wiberg wrote:
> Neil Brown wrote:
> > On Wednesday May 15, Sverker.Wiberg@uab.ericsson.se wrote:
> > > When copying lots of small files from multiple NFS clients to a kNFSd
> > > filesystem (i.e. doing backup of a cluster), exported with `sync', I
> > > find that some few files (1 out of 1000) were silently truncated to zero
>                                                   ^^^^^^^^
>                                                   no errors reported 
> > 
> > How are you mounting the file systems on the clients?
> > The symptoms sound exactly like you are using "soft" mounts.  "soft"
> > is a very bad mount option.  Use "hard".
> >
> > If you aren't using "soft", let me know and I will look harder.
> 
> Errrm, I am using "soft" mounts, as I (we) want the clients to survive
> server restarts.
> But shouldn't those timeouts become errors over at the clients?

I have seen this too, with a file system exported with rw,no_root_squash
and mounted hard,intr.  We were running vanilla 2.4.18 on the server
and clients.  We have a text file on the server serving to record
employees' time, and one day the time clock file remained a text file
but was truncated to zero.  All further punch ins/punch outs did not
record in the truncated file (user names, dates, and times should have
appended).  Deleting and recreating the text file on a client returned
behavior to normal.  No error messages whatsoever, and it has worked
fine for two weeks as we watch for the behavior to repeat.

Regards,
Gary S.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PROBLEM: knfsd misses occasional writes
  2002-05-16 20:34     ` G Sandine
@ 2002-05-17 10:38       ` Sverker Wiberg
  2002-05-17 12:32         ` PROBLEM: knfsd misses occasional writes (w/ WORKAROUND) Sverker Wiberg
  0 siblings, 1 reply; 9+ messages in thread
From: Sverker Wiberg @ 2002-05-17 10:38 UTC (permalink / raw)
  To: G Sandine; +Cc: linux-kernel

G Sandine wrote:

> [...]and one day the time clock file remained a text file
> but was truncated to zero. All further punch ins/punch outs did not
> record in the truncated file (user names, dates, and times should have
> appended).

Sounds as you've got the file's ownership and perms clobbered. I'll
check if that happens over here as well.

> Deleting and recreating the text file on a client returned
> behavior to normal.  No error messages whatsoever, and it has worked
> fine for two weeks as we watch for the behavior to repeat.

Then we're (cough!) luckier over here, we can recreate the problem in
about an hour. But at least it's nice to know you're no alone.

[To the lkml:]

Over here, we started to log the conversations, and saw the client
opening a file, writing 272 bytes into it (one write), and then closing
it, with the server replying full success all the time. printk()'s in
knfsd and the vfs's generic_write() also reported that 272 bytes had
been successfully written. Yet the file was truncated.

We switched from soft to hard mount: It didn't help. We are now
experimenting with disabling SCSI's disconnect/reconnect feature. Are
there any more straws to grasp at?

/Sverker

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PROBLEM: knfsd misses occasional writes (w/ WORKAROUND)
  2002-05-17 10:38       ` Sverker Wiberg
@ 2002-05-17 12:32         ` Sverker Wiberg
  0 siblings, 0 replies; 9+ messages in thread
From: Sverker Wiberg @ 2002-05-17 12:32 UTC (permalink / raw)
  To: linux-kernel

Sverker Wiberg wrote:

> Over here, we started to log the conversations, and saw the client
> opening a file, writing 272 bytes into it (one write), and then closing
> it, with the server replying full success all the time. printk()'s in
> knfsd and the vfs's generic_write() also reported that 272 bytes had
> been successfully written. Yet the file was truncated.
> 
> We switched from soft to hard mount: It didn't help. We are now
> experimenting with disabling SCSI's disconnect/reconnect feature. Are
> there any more straws to grasp at?

With one single knfsd thread running, the problem went away (for a price
in performance). This seems to indicate there is some kind of race
between the knfsd threads. 

/Sverker

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PROBLEM: knfsd misses occasional writes
@ 2002-05-17  2:32 Neil Brown
  0 siblings, 0 replies; 9+ messages in thread
From: Neil Brown @ 2002-05-17  2:32 UTC (permalink / raw)
  To: Sverker Wiberg; +Cc: Linux Kernel Mailing List

On Thursday May 16, Sverker.Wiberg@uab.ericsson.se wrote:
> Neil Brown wrote:
> > 
> > On Thursday May 16, Sverker.Wiberg@uab.ericsson.se wrote:
> 
> [on soft mount timeouts]
> > > But shouldn't those timeouts become errors over at the clients?
> > 
> > Yes... but "write" won't see an error.  Only 'fsync' or maybe 'close',
> > and many applications ignore errors from these operations.
> 
> How come? Isn't the client side innately synchronous (as RPC clients in
> general)?

Now way!  That would kill performance.

The application writes into the pagecache.  The nfs client, possibly
using  the helper thread like rpciod write asynchronously to the
server.  Data is only flushed on close or fsync or memory presure
or...
I have only a passing knowledge of this stuff though.  I trust Trond
will correct me is I say anything really silly.

> Or is this one of thost thing that are now done differently?

I think it was always this way.

NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2002-05-17 12:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-15 12:12 PROBLEM: knfsd misses occasional writes Sverker Wiberg
2002-05-15 12:18 ` Neil Brown
2002-05-16 10:49   ` Sverker Wiberg
2002-05-16 11:39     ` Neil Brown
2002-05-16 16:48       ` Sverker Wiberg
2002-05-16 20:34     ` G Sandine
2002-05-17 10:38       ` Sverker Wiberg
2002-05-17 12:32         ` PROBLEM: knfsd misses occasional writes (w/ WORKAROUND) Sverker Wiberg
2002-05-17  2:32 PROBLEM: knfsd misses occasional writes Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).