* Data corrupt after truncate at nfs client v3 mount point when nfs server restart
@ 2021-02-03 11:44 Kinglong Mee
0 siblings, 0 replies; only message in thread
From: Kinglong Mee @ 2021-02-03 11:44 UTC (permalink / raw)
To: Linux NFS Mailing List; +Cc: Trond Myklebust, Anna Schumaker, Chuck Lever
Hello,
I meet a data corrupt problem at nfs client v3 mount point without sync
at centos 7.6. (kernel 3.10.0-957.el7.x86_64).
When runing ltp ftest01 test case with restart/reboot nfs server,
ftest01 reports data corrupt or size mismatch sometimes.
Debugging shows,
1. The WRITE reply contains a write verifier, and the COMMIT reply
contains a verifer too.The knfsd encodes nfsd_net_id for the
two verifers, other nfs server (eg, nfs-ganesha) encodes the daemon
start time. After nfs server restart/reboot, the verifier is changed.
2. For mounting without sync, nfs client uses buffer io, sends WRITE
request with unstable argument.
After unstable WRITE success, it is added to commit list, not be
deleted directly.
3. Following sync(fsync) or setattr(truncate) will occurs a COMMIT,
if COMMIT success and the returned verifier is same as the unstable
WRITE in commit list, the unstable WRITE is deleted; otherwise the
unstable WRITE will be resend to nfs server.
It's in nfs_commit_release_pages().
4. After nfs server restart, the COMMIT may be processed by the newer
server that a different verifier is returned with COMMIT success.
At this case, those unstable WRITE will be resend but COMMIT finish
without any error and does not wait those resending WRITEs.
5. For sync(fsync), it's okay; but for setattr(truncate), data corrupt
appears that those resending WRITEs may be send to server with the
SETATTR for truncate simultaneously.
6. nfs_setattr() only does a nfs_sync_inode without processing the
result.
/* Write all dirty data */
if (S_ISREG(inode->i_mode))
nfs_sync_inode(inode);
Also, nfs_initiate_commit() does not return error for FLUSH_SYNC
when COMMIT meeting error.
Maybe we should return error to upper caller (the nfs_setattr()) who
doing FLUSH_SYNC commit when COMMIT meets different verifier as those
unstable WRITEs. With the error, upper caller may return error or
do nfs_sync_inode again.
Although I meet this problem at a older kernel, but the logical in the
latest nfs client source does not be updated.
Any suggestion is welcome.
thanks,
Kinglong Mee
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2021-02-03 11:45 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-03 11:44 Data corrupt after truncate at nfs client v3 mount point when nfs server restart Kinglong Mee
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.