* [PATCH] NFS: Fix O_DIRECT read problem when another write is going on @ 2019-09-30 9:11 Su Yanjun 2019-09-30 18:02 ` [PATCH 0/2] Fix O_DIRECT error handling Trond Myklebust 2019-09-30 18:06 ` [PATCH] NFS: Fix O_DIRECT read problem when another write is going on Trond Myklebust 0 siblings, 2 replies; 6+ messages in thread From: Su Yanjun @ 2019-09-30 9:11 UTC (permalink / raw) To: trond.myklebust; +Cc: linux-nfs, linux-kernel, suyj.fnst In xfstests generic/465 tests failed. Because O_DIRECT r/w use async rpc calls, when r/w rpc calls are running concurrently we may read partial data which is wrong. For example as follows. user buffer /--------\ | |XXXX| rpc0 rpc1 When rpc0 runs it encounters eof so return 0, then another writes something. When rpc1 runs it returns some data. The total data buffer contains wrong data. In this patch we check eof mark for each direct request. If encounters eof then set eof mark in the request, when we meet it again report -EAGAIN error. In nfs_direct_complete we convert -EAGAIN as if read nothing. When the reader issue another read it will read ok. Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> --- fs/nfs/direct.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 222d711..7f737a3 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -93,6 +93,7 @@ struct nfs_direct_req { bytes_left, /* bytes left to be sent */ error; /* any reported error */ struct completion completion; /* wait for i/o completion */ + int eof; /* eof mark in the req */ /* commit state */ struct nfs_mds_commit_info mds_cinfo; /* Storage for cinfo */ @@ -380,6 +381,12 @@ static void nfs_direct_complete(struct nfs_direct_req *dreq) { struct inode *inode = dreq->inode; + /* read partial data just as read nothing */ + if (dreq->error == -EAGAIN) { + dreq->count = 0; + dreq->error = 0; + } + inode_dio_end(inode); if (dreq->iocb) { @@ -413,8 +420,13 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr) if (hdr->good_bytes != 0) nfs_direct_good_bytes(dreq, hdr); - if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) + if (dreq->eof) + dreq->error = -EAGAIN; + + if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) { dreq->error = 0; + dreq->eof = 1; + } spin_unlock(&dreq->lock); -- 2.7.4 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 0/2] Fix O_DIRECT error handling 2019-09-30 9:11 [PATCH] NFS: Fix O_DIRECT read problem when another write is going on Su Yanjun @ 2019-09-30 18:02 ` Trond Myklebust 2019-09-30 18:02 ` [PATCH 1/2] NFS: Fix O_DIRECT accounting of number of bytes read/written Trond Myklebust 2019-09-30 18:06 ` [PATCH] NFS: Fix O_DIRECT read problem when another write is going on Trond Myklebust 1 sibling, 1 reply; 6+ messages in thread From: Trond Myklebust @ 2019-09-30 18:02 UTC (permalink / raw) To: Su Yanjun, Anna Schumaker; +Cc: linux-nfs These patches fix issues with how O_DIRECT handles read and write errors, and also how it handles eof. Trond Myklebust (2): NFS: Fix O_DIRECT accounting of number of bytes read/written NFS: Remove redundant mirror tracking in O_DIRECT fs/nfs/direct.c | 106 ++++++++++++++++-------------------------------- 1 file changed, 36 insertions(+), 70 deletions(-) -- 2.21.0 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] NFS: Fix O_DIRECT accounting of number of bytes read/written 2019-09-30 18:02 ` [PATCH 0/2] Fix O_DIRECT error handling Trond Myklebust @ 2019-09-30 18:02 ` Trond Myklebust 2019-09-30 18:02 ` [PATCH 2/2] NFS: Remove redundant mirror tracking in O_DIRECT Trond Myklebust 0 siblings, 1 reply; 6+ messages in thread From: Trond Myklebust @ 2019-09-30 18:02 UTC (permalink / raw) To: Su Yanjun, Anna Schumaker; +Cc: linux-nfs When a series of O_DIRECT reads or writes are truncated, either due to eof or due to an error, then we should return the number of contiguous bytes that were received/sent starting at the offset specified by the application. Currently, we are failing to correctly check contiguity, and so we're failing the generic/465 in xfstests when the race between the read and write RPCs causes the file to get extended while the 2 reads are outstanding. If the first read RPC call wins the race and returns with eof set, we should treat the second read RPC as being truncated. Reported-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> Fixes: 1ccbad9f9f9bd ("nfs: fix DIO good bytes calculation") Cc: stable@vger.kernel.org # 4.1+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/direct.c | 78 +++++++++++++++++++++++++++---------------------- 1 file changed, 43 insertions(+), 35 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 222d7115db71..98a9a0bcdf38 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -123,32 +123,49 @@ static inline int put_dreq(struct nfs_direct_req *dreq) } static void -nfs_direct_good_bytes(struct nfs_direct_req *dreq, struct nfs_pgio_header *hdr) +nfs_direct_handle_truncated(struct nfs_direct_req *dreq, + const struct nfs_pgio_header *hdr, + ssize_t dreq_len) { - int i; - ssize_t count; + struct nfs_direct_mirror *mirror = &dreq->mirrors[hdr->pgio_mirror_idx]; + + if (!(test_bit(NFS_IOHDR_ERROR, &hdr->flags) || + test_bit(NFS_IOHDR_EOF, &hdr->flags))) + return; + if (dreq->max_count >= dreq_len) { + dreq->max_count = dreq_len; + if (dreq->count > dreq_len) + dreq->count = dreq_len; + + if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) + dreq->error = hdr->error; + else /* Clear outstanding error if this is EOF */ + dreq->error = 0; + } + if (mirror->count > dreq_len) + mirror->count = dreq_len; +} - WARN_ON_ONCE(dreq->count >= dreq->max_count); +static void +nfs_direct_count_bytes(struct nfs_direct_req *dreq, + const struct nfs_pgio_header *hdr) +{ + struct nfs_direct_mirror *mirror = &dreq->mirrors[hdr->pgio_mirror_idx]; + loff_t hdr_end = hdr->io_start + hdr->good_bytes; + ssize_t dreq_len = 0; - if (dreq->mirror_count == 1) { - dreq->mirrors[hdr->pgio_mirror_idx].count += hdr->good_bytes; - dreq->count += hdr->good_bytes; - } else { - /* mirrored writes */ - count = dreq->mirrors[hdr->pgio_mirror_idx].count; - if (count + dreq->io_start < hdr->io_start + hdr->good_bytes) { - count = hdr->io_start + hdr->good_bytes - dreq->io_start; - dreq->mirrors[hdr->pgio_mirror_idx].count = count; - } - /* update the dreq->count by finding the minimum agreed count from all - * mirrors */ - count = dreq->mirrors[0].count; + if (hdr_end > dreq->io_start) + dreq_len = hdr_end - dreq->io_start; - for (i = 1; i < dreq->mirror_count; i++) - count = min(count, dreq->mirrors[i].count); + nfs_direct_handle_truncated(dreq, hdr, dreq_len); - dreq->count = count; - } + if (dreq_len > dreq->max_count) + dreq_len = dreq->max_count; + + if (mirror->count < dreq_len) + mirror->count = dreq_len; + if (dreq->count < dreq_len) + dreq->count = dreq_len; } /* @@ -402,20 +419,12 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr) struct nfs_direct_req *dreq = hdr->dreq; spin_lock(&dreq->lock); - if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) - dreq->error = hdr->error; - if (test_bit(NFS_IOHDR_REDO, &hdr->flags)) { spin_unlock(&dreq->lock); goto out_put; } - if (hdr->good_bytes != 0) - nfs_direct_good_bytes(dreq, hdr); - - if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) - dreq->error = 0; - + nfs_direct_count_bytes(dreq, hdr); spin_unlock(&dreq->lock); while (!list_empty(&hdr->pages)) { @@ -652,6 +661,9 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq) nfs_direct_write_scan_commit_list(dreq->inode, &reqs, &cinfo); dreq->count = 0; + dreq->max_count = 0; + list_for_each_entry(req, &reqs, wb_list) + dreq->max_count += req->wb_bytes; dreq->verf.committed = NFS_INVALID_STABLE_HOW; nfs_clear_pnfs_ds_commit_verifiers(&dreq->ds_cinfo); for (i = 0; i < dreq->mirror_count; i++) @@ -791,17 +803,13 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr) nfs_init_cinfo_from_dreq(&cinfo, dreq); spin_lock(&dreq->lock); - - if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) - dreq->error = hdr->error; - if (test_bit(NFS_IOHDR_REDO, &hdr->flags)) { spin_unlock(&dreq->lock); goto out_put; } + nfs_direct_count_bytes(dreq, hdr); if (hdr->good_bytes != 0) { - nfs_direct_good_bytes(dreq, hdr); if (nfs_write_need_commit(hdr)) { if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES) request_commit = true; -- 2.21.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] NFS: Remove redundant mirror tracking in O_DIRECT 2019-09-30 18:02 ` [PATCH 1/2] NFS: Fix O_DIRECT accounting of number of bytes read/written Trond Myklebust @ 2019-09-30 18:02 ` Trond Myklebust 0 siblings, 0 replies; 6+ messages in thread From: Trond Myklebust @ 2019-09-30 18:02 UTC (permalink / raw) To: Su Yanjun, Anna Schumaker; +Cc: linux-nfs We no longer need the extra mirror length tracking in the O_DIRECT code, as we are able to track the maximum contiguous length in dreq->max_count. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/direct.c | 42 ------------------------------------------ 1 file changed, 42 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 98a9a0bcdf38..040a50fd9bf3 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -64,13 +64,6 @@ static struct kmem_cache *nfs_direct_cachep; -/* - * This represents a set of asynchronous requests that we're waiting on - */ -struct nfs_direct_mirror { - ssize_t count; -}; - struct nfs_direct_req { struct kref kref; /* release manager */ @@ -84,9 +77,6 @@ struct nfs_direct_req { atomic_t io_count; /* i/os we're waiting for */ spinlock_t lock; /* protect completion state */ - struct nfs_direct_mirror mirrors[NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX]; - int mirror_count; - loff_t io_start; /* Start offset for I/O */ ssize_t count, /* bytes actually processed */ max_count, /* max expected count */ @@ -127,8 +117,6 @@ nfs_direct_handle_truncated(struct nfs_direct_req *dreq, const struct nfs_pgio_header *hdr, ssize_t dreq_len) { - struct nfs_direct_mirror *mirror = &dreq->mirrors[hdr->pgio_mirror_idx]; - if (!(test_bit(NFS_IOHDR_ERROR, &hdr->flags) || test_bit(NFS_IOHDR_EOF, &hdr->flags))) return; @@ -142,15 +130,12 @@ nfs_direct_handle_truncated(struct nfs_direct_req *dreq, else /* Clear outstanding error if this is EOF */ dreq->error = 0; } - if (mirror->count > dreq_len) - mirror->count = dreq_len; } static void nfs_direct_count_bytes(struct nfs_direct_req *dreq, const struct nfs_pgio_header *hdr) { - struct nfs_direct_mirror *mirror = &dreq->mirrors[hdr->pgio_mirror_idx]; loff_t hdr_end = hdr->io_start + hdr->good_bytes; ssize_t dreq_len = 0; @@ -162,8 +147,6 @@ nfs_direct_count_bytes(struct nfs_direct_req *dreq, if (dreq_len > dreq->max_count) dreq_len = dreq->max_count; - if (mirror->count < dreq_len) - mirror->count = dreq_len; if (dreq->count < dreq_len) dreq->count = dreq_len; } @@ -310,18 +293,6 @@ void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo, cinfo->completion_ops = &nfs_direct_commit_completion_ops; } -static inline void nfs_direct_setup_mirroring(struct nfs_direct_req *dreq, - struct nfs_pageio_descriptor *pgio, - struct nfs_page *req) -{ - int mirror_count = 1; - - if (pgio->pg_ops->pg_get_mirror_count) - mirror_count = pgio->pg_ops->pg_get_mirror_count(pgio, req); - - dreq->mirror_count = mirror_count; -} - static inline struct nfs_direct_req *nfs_direct_req_alloc(void) { struct nfs_direct_req *dreq; @@ -336,7 +307,6 @@ static inline struct nfs_direct_req *nfs_direct_req_alloc(void) INIT_LIST_HEAD(&dreq->mds_cinfo.list); dreq->verf.committed = NFS_INVALID_STABLE_HOW; /* not set yet */ INIT_WORK(&dreq->work, nfs_direct_write_schedule_work); - dreq->mirror_count = 1; spin_lock_init(&dreq->lock); return dreq; @@ -655,7 +625,6 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq) LIST_HEAD(reqs); struct nfs_commit_info cinfo; LIST_HEAD(failed); - int i; nfs_init_cinfo_from_dreq(&cinfo, dreq); nfs_direct_write_scan_commit_list(dreq->inode, &reqs, &cinfo); @@ -666,21 +635,12 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq) dreq->max_count += req->wb_bytes; dreq->verf.committed = NFS_INVALID_STABLE_HOW; nfs_clear_pnfs_ds_commit_verifiers(&dreq->ds_cinfo); - for (i = 0; i < dreq->mirror_count; i++) - dreq->mirrors[i].count = 0; get_dreq(dreq); nfs_pageio_init_write(&desc, dreq->inode, FLUSH_STABLE, false, &nfs_direct_write_completion_ops); desc.pg_dreq = dreq; - req = nfs_list_entry(reqs.next); - nfs_direct_setup_mirroring(dreq, &desc, req); - if (desc.pg_error < 0) { - list_splice_init(&reqs, &failed); - goto out_failed; - } - list_for_each_entry_safe(req, tmp, &reqs, wb_list) { /* Bump the transmission count */ req->wb_nio++; @@ -698,7 +658,6 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq) } nfs_pageio_complete(&desc); -out_failed: while (!list_empty(&failed)) { req = nfs_list_entry(failed.next); nfs_list_remove_request(req); @@ -931,7 +890,6 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, break; } - nfs_direct_setup_mirroring(dreq, &desc, req); if (desc.pg_error < 0) { nfs_free_request(req); result = desc.pg_error; -- 2.21.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] NFS: Fix O_DIRECT read problem when another write is going on 2019-09-30 9:11 [PATCH] NFS: Fix O_DIRECT read problem when another write is going on Su Yanjun 2019-09-30 18:02 ` [PATCH 0/2] Fix O_DIRECT error handling Trond Myklebust @ 2019-09-30 18:06 ` Trond Myklebust 2019-10-07 2:17 ` Su Yanjun 1 sibling, 1 reply; 6+ messages in thread From: Trond Myklebust @ 2019-09-30 18:06 UTC (permalink / raw) To: suyj.fnst; +Cc: linux-nfs, linux-kernel Hi Su, On Mon, 2019-09-30 at 17:11 +0800, Su Yanjun wrote: > In xfstests generic/465 tests failed. Because O_DIRECT r/w use > async rpc calls, when r/w rpc calls are running concurrently we > may read partial data which is wrong. > > For example as follows. > user buffer > /--------\ > > |XXXX| > rpc0 rpc1 > > When rpc0 runs it encounters eof so return 0, then another writes > something. When rpc1 runs it returns some data. The total data > buffer contains wrong data. > > In this patch we check eof mark for each direct request. If > encounters > eof then set eof mark in the request, when we meet it again report > -EAGAIN error. In nfs_direct_complete we convert -EAGAIN as if read > nothing. When the reader issue another read it will read ok. > > Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> > --- > fs/nfs/direct.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c > index 222d711..7f737a3 100644 > --- a/fs/nfs/direct.c > +++ b/fs/nfs/direct.c > @@ -93,6 +93,7 @@ struct nfs_direct_req { > bytes_left, /* bytes left to be > sent */ > error; /* any reported error > */ > struct completion completion; /* wait for i/o completion */ > + int eof; /* eof mark in the > req */ > > /* commit state */ > struct nfs_mds_commit_info mds_cinfo; /* Storage for cinfo > */ > @@ -380,6 +381,12 @@ static void nfs_direct_complete(struct > nfs_direct_req *dreq) > { > struct inode *inode = dreq->inode; > > + /* read partial data just as read nothing */ > + if (dreq->error == -EAGAIN) { > + dreq->count = 0; > + dreq->error = 0; > + } > + > inode_dio_end(inode); > > if (dreq->iocb) { > @@ -413,8 +420,13 @@ static void nfs_direct_read_completion(struct > nfs_pgio_header *hdr) > if (hdr->good_bytes != 0) > nfs_direct_good_bytes(dreq, hdr); > > - if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) > + if (dreq->eof) > + dreq->error = -EAGAIN; > + > + if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) { > dreq->error = 0; > + dreq->eof = 1; > + } > > spin_unlock(&dreq->lock); > Thanks for looking into this issue. I agree with your analysis of what is going wrong in generic/465. However, I think the problem is greater than just EOF. I think we also need to look at the generic error handling, and ensure that it handles a truncated RPC call in the middle of a series of calls correctly. Please see the two patches I sent you just now and check if they fix the problem for you. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] NFS: Fix O_DIRECT read problem when another write is going on 2019-09-30 18:06 ` [PATCH] NFS: Fix O_DIRECT read problem when another write is going on Trond Myklebust @ 2019-10-07 2:17 ` Su Yanjun 0 siblings, 0 replies; 6+ messages in thread From: Su Yanjun @ 2019-10-07 2:17 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs, linux-kernel 在 2019/10/1 2:06, Trond Myklebust 写道: > Hi Su, > > On Mon, 2019-09-30 at 17:11 +0800, Su Yanjun wrote: >> In xfstests generic/465 tests failed. Because O_DIRECT r/w use >> async rpc calls, when r/w rpc calls are running concurrently we >> may read partial data which is wrong. >> >> For example as follows. >> user buffer >> /--------\ >>> |XXXX| >> rpc0 rpc1 >> >> When rpc0 runs it encounters eof so return 0, then another writes >> something. When rpc1 runs it returns some data. The total data >> buffer contains wrong data. >> >> In this patch we check eof mark for each direct request. If >> encounters >> eof then set eof mark in the request, when we meet it again report >> -EAGAIN error. In nfs_direct_complete we convert -EAGAIN as if read >> nothing. When the reader issue another read it will read ok. >> >> Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> >> --- >> fs/nfs/direct.c | 14 +++++++++++++- >> 1 file changed, 13 insertions(+), 1 deletion(-) >> >> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c >> index 222d711..7f737a3 100644 >> --- a/fs/nfs/direct.c >> +++ b/fs/nfs/direct.c >> @@ -93,6 +93,7 @@ struct nfs_direct_req { >> bytes_left, /* bytes left to be >> sent */ >> error; /* any reported error >> */ >> struct completion completion; /* wait for i/o completion */ >> + int eof; /* eof mark in the >> req */ >> >> /* commit state */ >> struct nfs_mds_commit_info mds_cinfo; /* Storage for cinfo >> */ >> @@ -380,6 +381,12 @@ static void nfs_direct_complete(struct >> nfs_direct_req *dreq) >> { >> struct inode *inode = dreq->inode; >> >> + /* read partial data just as read nothing */ >> + if (dreq->error == -EAGAIN) { >> + dreq->count = 0; >> + dreq->error = 0; >> + } >> + >> inode_dio_end(inode); >> >> if (dreq->iocb) { >> @@ -413,8 +420,13 @@ static void nfs_direct_read_completion(struct >> nfs_pgio_header *hdr) >> if (hdr->good_bytes != 0) >> nfs_direct_good_bytes(dreq, hdr); >> >> - if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) >> + if (dreq->eof) >> + dreq->error = -EAGAIN; >> + >> + if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) { >> dreq->error = 0; >> + dreq->eof = 1; >> + } >> >> spin_unlock(&dreq->lock); >> > Thanks for looking into this issue. I agree with your analysis of what > is going wrong in generic/465. > > However, I think the problem is greater than just EOF. I think we also > need to look at the generic error handling, and ensure that it handles > a truncated RPC call in the middle of a series of calls correctly. > > Please see the two patches I sent you just now and check if they fix > the problem for you. The patchset you sent works for generic/465. Thanks a lot ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-10-07 2:17 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-09-30 9:11 [PATCH] NFS: Fix O_DIRECT read problem when another write is going on Su Yanjun 2019-09-30 18:02 ` [PATCH 0/2] Fix O_DIRECT error handling Trond Myklebust 2019-09-30 18:02 ` [PATCH 1/2] NFS: Fix O_DIRECT accounting of number of bytes read/written Trond Myklebust 2019-09-30 18:02 ` [PATCH 2/2] NFS: Remove redundant mirror tracking in O_DIRECT Trond Myklebust 2019-09-30 18:06 ` [PATCH] NFS: Fix O_DIRECT read problem when another write is going on Trond Myklebust 2019-10-07 2:17 ` Su Yanjun
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).