All of lore.kernel.org
 help / color / mirror / Atom feed
* Clarification on client "async" option
       [not found] <2124227602.386654.1409868802614.JavaMail.zimbra@xes-inc.com>
@ 2014-09-04 22:23 ` Andrew Martin
  2014-09-05  1:52   ` Trond Myklebust
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Martin @ 2014-09-04 22:23 UTC (permalink / raw)
  To: linux-nfs

Hello,

I would like to understand in more detail how the client-side "async" option
works with NFSv3 when used with the NFSv3 server-side option "sync" (async on
the client, sync on the server). According to the manpage:

>       The NFS client treats the sync mount option differently than some other
> file systems (refer to mount(8) for a description of the generic sync and async
> mount options).  If neither sync nor async is specified (or if the async option
> is specified), the NFS client delays sending application writes to the server
> until any of these  events occur:
>
>              Memory pressure forces reclamation of system memory resources.
>
>              An application flushes file data explicitly with sync(2),
>              msync(2), or fsync(3).
>
>              An application closes a file with close(2).
>
>              The file is locked/unlocked via fcntl(2).


When performing a sample strace, e.g with rsync:
strace -f rsync -av hosts /mn/nfs/dest

I see the following:
[pid 10670] read(3, "127.0.0.1\tlocalhost\n#127.0.1.1\tv"..., 350) = 350
[pid 10670] close(3)                    = 0
[pid 10670] select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left {59, 999998})
[pid 10670] write(4, "\211\1\0\7\2\0\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0^\1\0\000127.0"..., 397 <unfinished ...>
[pid 10672] <... select resumed> )      = 1 (in [0], left {59, 999185})
[pid 10670] <... write resumed> )       = 397
[pid 10672] read(0,  <unfinished ...>
[pid 10670] select(6, [5], [], NULL, {60, 0} <unfinished ...>
[pid 10672] <... read resumed> "\211\1\0\7", 4) = 4
[pid 10672] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 999998})
[pid 10672] read(0, "\2\0\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0^\1\0\000127.0.0.1"..., 393) = 393
[pid 10672] open("dest", O_RDONLY)      = -1 ENOENT (No such file or directory)
[pid 10672] open(".dest.y4ihWF", O_RDWR|O_CREAT|O_EXCL, 0600) = 1
[pid 10672] fchmod(1, 0600)             = 0
[pid 10672] mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7c28b49000
[pid 10672] write(1, "127.0.0.1\tlocalhost\n#127.0.1.1\tv"..., 350) = 350
[pid 10672] close(1)                    = 0
[pid 10672] lstat(".dest.y4ihWF", {st_mode=S_IFREG|0600, st_size=350, ...}) = 0
[pid 10672] utimensat(AT_FDCWD, ".dest.y4ihWF", {UTIME_NOW, {1357852439, 0}}, AT_SYMLINK_NOFOLLOW) = 0
[pid 10672] chmod(".dest.y4ihWF", 0644) = 0
[pid 10672] rename(".dest.y4ihWF", "dest") = 0

This shows that a temporary filename is written and then closed, however the
file is then chmodded and renamed to the final destination filename. Do the
chmod(2) and rename(2) calls force a COMMIT to be sent, flushing these changes
to stable storage on the NFS server? Or, is there a possibility that during a
power failure of both client and server, the file would remain as .dest.y4ihWF
on the server?


Thanks,

Andrew Martin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Clarification on client "async" option
  2014-09-04 22:23 ` Clarification on client "async" option Andrew Martin
@ 2014-09-05  1:52   ` Trond Myklebust
  2014-09-08 18:49     ` Andrew Martin
  0 siblings, 1 reply; 4+ messages in thread
From: Trond Myklebust @ 2014-09-05  1:52 UTC (permalink / raw)
  To: Andrew Martin; +Cc: Linux NFS Mailing List

On Thu, Sep 4, 2014 at 6:23 PM, Andrew Martin <amartin@xes-inc.com> wrote:
> Hello,
>
> I would like to understand in more detail how the client-side "async" option
> works with NFSv3 when used with the NFSv3 server-side option "sync" (async on
> the client, sync on the server). According to the manpage:
>
>>       The NFS client treats the sync mount option differently than some other
>> file systems (refer to mount(8) for a description of the generic sync and async
>> mount options).  If neither sync nor async is specified (or if the async option
>> is specified), the NFS client delays sending application writes to the server
>> until any of these  events occur:
>>
>>              Memory pressure forces reclamation of system memory resources.
>>
>>              An application flushes file data explicitly with sync(2),
>>              msync(2), or fsync(3).
>>
>>              An application closes a file with close(2).
>>
>>              The file is locked/unlocked via fcntl(2).
>
>
> When performing a sample strace, e.g with rsync:
> strace -f rsync -av hosts /mn/nfs/dest
>
> I see the following:
> [pid 10670] read(3, "127.0.0.1\tlocalhost\n#127.0.1.1\tv"..., 350) = 350
> [pid 10670] close(3)                    = 0
> [pid 10670] select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left {59, 999998})
> [pid 10670] write(4, "\211\1\0\7\2\0\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0^\1\0\000127.0"..., 397 <unfinished ...>
> [pid 10672] <... select resumed> )      = 1 (in [0], left {59, 999185})
> [pid 10670] <... write resumed> )       = 397
> [pid 10672] read(0,  <unfinished ...>
> [pid 10670] select(6, [5], [], NULL, {60, 0} <unfinished ...>
> [pid 10672] <... read resumed> "\211\1\0\7", 4) = 4
> [pid 10672] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 999998})
> [pid 10672] read(0, "\2\0\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0^\1\0\000127.0.0.1"..., 393) = 393
> [pid 10672] open("dest", O_RDONLY)      = -1 ENOENT (No such file or directory)
> [pid 10672] open(".dest.y4ihWF", O_RDWR|O_CREAT|O_EXCL, 0600) = 1
> [pid 10672] fchmod(1, 0600)             = 0
> [pid 10672] mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7c28b49000
> [pid 10672] write(1, "127.0.0.1\tlocalhost\n#127.0.1.1\tv"..., 350) = 350
> [pid 10672] close(1)                    = 0
> [pid 10672] lstat(".dest.y4ihWF", {st_mode=S_IFREG|0600, st_size=350, ...}) = 0
> [pid 10672] utimensat(AT_FDCWD, ".dest.y4ihWF", {UTIME_NOW, {1357852439, 0}}, AT_SYMLINK_NOFOLLOW) = 0
> [pid 10672] chmod(".dest.y4ihWF", 0644) = 0
> [pid 10672] rename(".dest.y4ihWF", "dest") = 0
>
> This shows that a temporary filename is written and then closed, however the
> file is then chmodded and renamed to the final destination filename. Do the
> chmod(2) and rename(2) calls force a COMMIT to be sent, flushing these changes
> to stable storage on the NFS server? Or, is there a possibility that during a
> power failure of both client and server, the file would remain as .dest.y4ihWF
> on the server?

In NFSv3, the close() will cause the client to flush all data to stable storage.
The client will also flush data to stable storage on a chmod, since
that could potentially affect its ability to write back the data. It
will not bother to do so for rename.
An application should normally be able to rely on the data being
safely on disk in both these situations provided that the server
honours the NFS protocol (with a caveat that an ill-timed 'kill -9'
could interrupt the process of flushing).

All metadata operations such as create, chmod, rename, etc. will cause
the server to flush the file metadata to disk assuming that you set
the (highly recommended) "sync" export option. If "sync" is set, the
server will also honour COMMIT requests by flushing the data to stable
storage.
If, OTOH, your server lists the "async" export option as being set,
then COMMIT is considered a no-op, and it will not bother to
explicitly flush metadata operations to stable storage. Performance
will scream, but be prepared to lose data if that server crashes. This
is all technically a violation of the NFS spec, however you have been
given rope...

-- 
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Clarification on client "async" option
  2014-09-05  1:52   ` Trond Myklebust
@ 2014-09-08 18:49     ` Andrew Martin
  2014-09-08 23:11       ` Malahal Naineni
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Martin @ 2014-09-08 18:49 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing List

----- Original Message -----
> From: "Trond Myklebust" <trond.myklebust@primarydata.com>
> > This shows that a temporary filename is written and then closed, however
> > the
> > file is then chmodded and renamed to the final destination filename. Do the
> > chmod(2) and rename(2) calls force a COMMIT to be sent, flushing these
> > changes
> > to stable storage on the NFS server? Or, is there a possibility that during
> > a
> > power failure of both client and server, the file would remain as
> > .dest.y4ihWF
> > on the server?
> 
> In NFSv3, the close() will cause the client to flush all data to stable
> storage.
> The client will also flush data to stable storage on a chmod, since
> that could potentially affect its ability to write back the data. It
> will not bother to do so for rename.
> An application should normally be able to rely on the data being
> safely on disk in both these situations provided that the server
> honours the NFS protocol (with a caveat that an ill-timed 'kill -9'
> could interrupt the process of flushing).
> 
> All metadata operations such as create, chmod, rename, etc. will cause
> the server to flush the file metadata to disk assuming that you set
> the (highly recommended) "sync" export option. If "sync" is set, the
> server will also honour COMMIT requests by flushing the data to stable
> storage.

Thanks for the clarification - I will use "sync" on the server side and
"async" on the client side, since I know now that this combination will
provide both data and metadata safety.

Andrew

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Clarification on client "async" option
  2014-09-08 18:49     ` Andrew Martin
@ 2014-09-08 23:11       ` Malahal Naineni
  0 siblings, 0 replies; 4+ messages in thread
From: Malahal Naineni @ 2014-09-08 23:11 UTC (permalink / raw)
  To: Andrew Martin; +Cc: Trond Myklebust, Linux NFS Mailing List

Andrew Martin [amartin@xes-inc.com] wrote:
> ----- Original Message -----
> > From: "Trond Myklebust" <trond.myklebust@primarydata.com>
> > > This shows that a temporary filename is written and then closed, however
> > > the
> > > file is then chmodded and renamed to the final destination filename. Do the
> > > chmod(2) and rename(2) calls force a COMMIT to be sent, flushing these
> > > changes
> > > to stable storage on the NFS server? Or, is there a possibility that during
> > > a
> > > power failure of both client and server, the file would remain as
> > > .dest.y4ihWF
> > > on the server?
> > 
> > In NFSv3, the close() will cause the client to flush all data to stable
> > storage.
> > The client will also flush data to stable storage on a chmod, since
> > that could potentially affect its ability to write back the data. It
> > will not bother to do so for rename.
> > An application should normally be able to rely on the data being
> > safely on disk in both these situations provided that the server
> > honours the NFS protocol (with a caveat that an ill-timed 'kill -9'
> > could interrupt the process of flushing).
> > 
> > All metadata operations such as create, chmod, rename, etc. will cause
> > the server to flush the file metadata to disk assuming that you set
> > the (highly recommended) "sync" export option. If "sync" is set, the
> > server will also honour COMMIT requests by flushing the data to stable
> > storage.
> 
> Thanks for the clarification - I will use "sync" on the server side and
> "async" on the client side, since I know now that this combination will
> provide both data and metadata safety.

That should be the default too.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-09-08 23:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <2124227602.386654.1409868802614.JavaMail.zimbra@xes-inc.com>
2014-09-04 22:23 ` Clarification on client "async" option Andrew Martin
2014-09-05  1:52   ` Trond Myklebust
2014-09-08 18:49     ` Andrew Martin
2014-09-08 23:11       ` Malahal Naineni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.