* CLOSE/OPEN race @ 2016-11-12 11:08 Benjamin Coddington 2016-11-12 12:54 ` Jeff Layton 2016-11-12 18:16 ` Trond Myklebust 0 siblings, 2 replies; 17+ messages in thread From: Benjamin Coddington @ 2016-11-12 11:08 UTC (permalink / raw) To: List Linux NFS Mailing I've been seeing the following on a modified version of generic/089 that gets the client stuck sending LOCK with NFS4ERR_OLD_STATEID. 1. Client has open stateid A, sends a CLOSE 2. Client sends OPEN with same owner 3. Client sends another OPEN with same owner 4. Client gets a reply to OPEN in 3, stateid is B.2 (stateid B sequence 2) 5. Client does LOCK,LOCKU,FREE_STATEID from B.2 6. Client gets a reply to CLOSE in 1 7. Client gets reply to OPEN in 2, stateid is B.1 8. Client sends LOCK with B.1 - OLD_STATEID, now stuck in a loop The CLOSE response in 6 causes us to clear NFS_OPEN_STATE, so that the OPEN response in 7 is able to update the open_stateid even though it has a lower sequence number. I think this case could be handled by never updating the open_stateid if the stateids match but the sequence number of the new state is less than the current open_state. Any thoughts? Ben ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: CLOSE/OPEN race 2016-11-12 11:08 CLOSE/OPEN race Benjamin Coddington @ 2016-11-12 12:54 ` Jeff Layton 2016-11-12 15:31 ` Benjamin Coddington 2016-11-12 18:16 ` Trond Myklebust 1 sibling, 1 reply; 17+ messages in thread From: Jeff Layton @ 2016-11-12 12:54 UTC (permalink / raw) To: Benjamin Coddington, List Linux NFS Mailing On Sat, 2016-11-12 at 06:08 -0500, Benjamin Coddington wrote: > I've been seeing the following on a modified version of generic/089 > that gets the client stuck sending LOCK with NFS4ERR_OLD_STATEID. > > 1. Client has open stateid A, sends a CLOSE > 2. Client sends OPEN with same owner > 3. Client sends another OPEN with same owner > 4. Client gets a reply to OPEN in 3, stateid is B.2 (stateid B sequence 2) > 5. Client does LOCK,LOCKU,FREE_STATEID from B.2 > 6. Client gets a reply to CLOSE in 1 > 7. Client gets reply to OPEN in 2, stateid is B.1 > 8. Client sends LOCK with B.1 - OLD_STATEID, now stuck in a loop > > The CLOSE response in 6 causes us to clear NFS_OPEN_STATE, so that the OPEN > response in 7 is able to update the open_stateid even though it has a lower > sequence number. > > I think this case could be handled by never updating the open_stateid if the > stateids match but the sequence number of the new state is less than the > current open_state. > What kernel is this on? Yes, that seems wrong. The client should be picking B.2 for the open stateid to use. I think that decision of whether to take a seqid is made in nfs_need_update_open_stateid. The logic in there looks correct to me at first glance though. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: CLOSE/OPEN race 2016-11-12 12:54 ` Jeff Layton @ 2016-11-12 15:31 ` Benjamin Coddington 2016-11-12 16:52 ` Jeff Layton 0 siblings, 1 reply; 17+ messages in thread From: Benjamin Coddington @ 2016-11-12 15:31 UTC (permalink / raw) To: Jeff Layton; +Cc: List Linux NFS Mailing On 12 Nov 2016, at 7:54, Jeff Layton wrote: > On Sat, 2016-11-12 at 06:08 -0500, Benjamin Coddington wrote: >> I've been seeing the following on a modified version of generic/089 >> that gets the client stuck sending LOCK with NFS4ERR_OLD_STATEID. >> >> 1. Client has open stateid A, sends a CLOSE >> 2. Client sends OPEN with same owner >> 3. Client sends another OPEN with same owner >> 4. Client gets a reply to OPEN in 3, stateid is B.2 (stateid B >> sequence 2) >> 5. Client does LOCK,LOCKU,FREE_STATEID from B.2 >> 6. Client gets a reply to CLOSE in 1 >> 7. Client gets reply to OPEN in 2, stateid is B.1 >> 8. Client sends LOCK with B.1 - OLD_STATEID, now stuck in a loop >> >> The CLOSE response in 6 causes us to clear NFS_OPEN_STATE, so that >> the OPEN >> response in 7 is able to update the open_stateid even though it has a >> lower >> sequence number. >> >> I think this case could be handled by never updating the open_stateid >> if the >> stateids match but the sequence number of the new state is less than >> the >> current open_state. >> > > What kernel is this on? On v4.9-rc2 with a couple fixups. Without them, I can't test long enough to reproduce this race. I don't think any of those are involved in this problem, though. > Yes, that seems wrong. The client should be picking B.2 for the open > stateid to use. I think that decision of whether to take a seqid is > made > in nfs_need_update_open_stateid. The logic in there looks correct to > me > at first glance though. nfs_need_update_open_stateid() will return true if NFS_OPEN_STATE is unset. That's the precondition set up by steps 1-6. Perhaps it should not update the stateid if they match but the sequence number is less, and still set NFS_OPEN_STATE once more. That will fix _this_ case. Are there other cases where that would be a problem? Ben ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: CLOSE/OPEN race 2016-11-12 15:31 ` Benjamin Coddington @ 2016-11-12 16:52 ` Jeff Layton 2016-11-12 18:03 ` Benjamin Coddington 0 siblings, 1 reply; 17+ messages in thread From: Jeff Layton @ 2016-11-12 16:52 UTC (permalink / raw) To: Benjamin Coddington; +Cc: List Linux NFS Mailing On Sat, 2016-11-12 at 10:31 -0500, Benjamin Coddington wrote: > On 12 Nov 2016, at 7:54, Jeff Layton wrote: > > > > > On Sat, 2016-11-12 at 06:08 -0500, Benjamin Coddington wrote: > > > > > > I've been seeing the following on a modified version of generic/089 > > > that gets the client stuck sending LOCK with NFS4ERR_OLD_STATEID. > > > > > > 1. Client has open stateid A, sends a CLOSE > > > 2. Client sends OPEN with same owner > > > 3. Client sends another OPEN with same owner > > > 4. Client gets a reply to OPEN in 3, stateid is B.2 (stateid B > > > sequence 2) > > > 5. Client does LOCK,LOCKU,FREE_STATEID from B.2 > > > 6. Client gets a reply to CLOSE in 1 > > > 7. Client gets reply to OPEN in 2, stateid is B.1 > > > 8. Client sends LOCK with B.1 - OLD_STATEID, now stuck in a loop > > > > > > The CLOSE response in 6 causes us to clear NFS_OPEN_STATE, so that > > > the OPEN > > > response in 7 is able to update the open_stateid even though it has a > > > lower > > > sequence number. > > > > > > I think this case could be handled by never updating the open_stateid > > > if the > > > stateids match but the sequence number of the new state is less than > > > the > > > current open_state. > > > > > > > What kernel is this on? > > On v4.9-rc2 with a couple fixups. Without them, I can't test long > enough to > reproduce this race. I don't think any of those are involved in this > problem, though. > > > > > Yes, that seems wrong. The client should be picking B.2 for the open > > stateid to use. I think that decision of whether to take a seqid is > > made > > in nfs_need_update_open_stateid. The logic in there looks correct to > > me > > at first glance though. > > nfs_need_update_open_stateid() will return true if NFS_OPEN_STATE is > unset. > That's the precondition set up by steps 1-6. Perhaps it should not > update > the stateid if they match but the sequence number is less, and still set > NFS_OPEN_STATE once more. That will fix _this_ case. Are there other > cases > where that would be a problem? > > Ben That seems wrong. The only close was sent in step 1, and that was for a completely different stateid (A rather than B). It seems likely that that is where the bug is. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: CLOSE/OPEN race 2016-11-12 16:52 ` Jeff Layton @ 2016-11-12 18:03 ` Benjamin Coddington 2016-11-12 21:16 ` Jeff Layton 0 siblings, 1 reply; 17+ messages in thread From: Benjamin Coddington @ 2016-11-12 18:03 UTC (permalink / raw) To: Jeff Layton; +Cc: List Linux NFS Mailing On 12 Nov 2016, at 11:52, Jeff Layton wrote: > On Sat, 2016-11-12 at 10:31 -0500, Benjamin Coddington wrote: >> On 12 Nov 2016, at 7:54, Jeff Layton wrote: >> >>> >>> On Sat, 2016-11-12 at 06:08 -0500, Benjamin Coddington wrote: >>>> >>>> I've been seeing the following on a modified version of generic/089 >>>> that gets the client stuck sending LOCK with NFS4ERR_OLD_STATEID. >>>> >>>> 1. Client has open stateid A, sends a CLOSE >>>> 2. Client sends OPEN with same owner >>>> 3. Client sends another OPEN with same owner >>>> 4. Client gets a reply to OPEN in 3, stateid is B.2 (stateid B >>>> sequence 2) >>>> 5. Client does LOCK,LOCKU,FREE_STATEID from B.2 >>>> 6. Client gets a reply to CLOSE in 1 >>>> 7. Client gets reply to OPEN in 2, stateid is B.1 >>>> 8. Client sends LOCK with B.1 - OLD_STATEID, now stuck in a loop >>>> >>>> The CLOSE response in 6 causes us to clear NFS_OPEN_STATE, so that >>>> the OPEN >>>> response in 7 is able to update the open_stateid even though it has a >>>> lower >>>> sequence number. >>>> >>>> I think this case could be handled by never updating the open_stateid >>>> if the >>>> stateids match but the sequence number of the new state is less than >>>> the >>>> current open_state. >>>> >>> >>> What kernel is this on? >> >> On v4.9-rc2 with a couple fixups. Without them, I can't test long >> enough to >> reproduce this race. I don't think any of those are involved in this >> problem, though. >> >>> >>> Yes, that seems wrong. The client should be picking B.2 for the open >>> stateid to use. I think that decision of whether to take a seqid is >>> made >>> in nfs_need_update_open_stateid. The logic in there looks correct to >>> me >>> at first glance though. >> >> nfs_need_update_open_stateid() will return true if NFS_OPEN_STATE is >> unset. >> That's the precondition set up by steps 1-6. Perhaps it should not >> update >> the stateid if they match but the sequence number is less, and still set >> NFS_OPEN_STATE once more. That will fix _this_ case. Are there other >> cases >> where that would be a problem? >> >> Ben > > That seems wrong. I'm not sure what you mean: what seems wrong? > The only close was sent in step 1, and that was for a > completely different stateid (A rather than B). It seems likely that > that is where the bug is. I'm still not sure what point you're trying to make.. Even though the close was sent in step 1, the response wasn't processed until step 6.. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: CLOSE/OPEN race 2016-11-12 18:03 ` Benjamin Coddington @ 2016-11-12 21:16 ` Jeff Layton 2016-11-13 2:56 ` Jeff Layton 0 siblings, 1 reply; 17+ messages in thread From: Jeff Layton @ 2016-11-12 21:16 UTC (permalink / raw) To: Benjamin Coddington; +Cc: List Linux NFS Mailing On Sat, 2016-11-12 at 13:03 -0500, Benjamin Coddington wrote: > > On 12 Nov 2016, at 11:52, Jeff Layton wrote: > > > > > On Sat, 2016-11-12 at 10:31 -0500, Benjamin Coddington wrote: > > > > > > On 12 Nov 2016, at 7:54, Jeff Layton wrote: > > > > > > > > > > > > > > > On Sat, 2016-11-12 at 06:08 -0500, Benjamin Coddington wrote: > > > > > > > > > > > > > > > I've been seeing the following on a modified version of generic/089 > > > > > that gets the client stuck sending LOCK with NFS4ERR_OLD_STATEID. > > > > > > > > > > 1. Client has open stateid A, sends a CLOSE > > > > > 2. Client sends OPEN with same owner > > > > > 3. Client sends another OPEN with same owner > > > > > 4. Client gets a reply to OPEN in 3, stateid is B.2 (stateid B > > > > > sequence 2) > > > > > 5. Client does LOCK,LOCKU,FREE_STATEID from B.2 > > > > > 6. Client gets a reply to CLOSE in 1 > > > > > 7. Client gets reply to OPEN in 2, stateid is B.1 > > > > > 8. Client sends LOCK with B.1 - OLD_STATEID, now stuck in a loop > > > > > > > > > > The CLOSE response in 6 causes us to clear NFS_OPEN_STATE, so that > > > > > the OPEN > > > > > response in 7 is able to update the open_stateid even though it has a > > > > > lower > > > > > sequence number. > > > > > > > > > > I think this case could be handled by never updating the open_stateid > > > > > if the > > > > > stateids match but the sequence number of the new state is less than > > > > > the > > > > > current open_state. > > > > > > > > > > > > > What kernel is this on? > > > > > > On v4.9-rc2 with a couple fixups. Without them, I can't test long > > > enough to > > > reproduce this race. I don't think any of those are involved in this > > > problem, though. > > > > > > > > > > > > > > > Yes, that seems wrong. The client should be picking B.2 for the open > > > > stateid to use. I think that decision of whether to take a seqid is > > > > made > > > > in nfs_need_update_open_stateid. The logic in there looks correct to > > > > me > > > > at first glance though. > > > > > > nfs_need_update_open_stateid() will return true if NFS_OPEN_STATE is > > > unset. > > > That's the precondition set up by steps 1-6. Perhaps it should not > > > update > > > the stateid if they match but the sequence number is less, and still set > > > NFS_OPEN_STATE once more. That will fix _this_ case. Are there other > > > cases > > > where that would be a problem? > > > > > > Ben > > > > That seems wrong. > > I'm not sure what you mean: what seems wrong? > Sorry, it seems wrong that the client would issue the LOCK with B.1 there. > > > > The only close was sent in step 1, and that was for a > > completely different stateid (A rather than B). It seems likely that > > that is where the bug is. > > I'm still not sure what point you're trying to make.. > > Even though the close was sent in step 1, the response wasn't processed > until step 6.. Not really a point per-se, I was just saying where I think the bug might be... When you issue a CLOSE, you issue it vs. a particular stateid (stateid "A" in this case). Once the open stateid has been superseded by "B", the closing of "A" should have no effect. Perhaps nfs_clear_open_stateid needs to check and see whether the open stateid has been superseded before doing its thing? -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: CLOSE/OPEN race 2016-11-12 21:16 ` Jeff Layton @ 2016-11-13 2:56 ` Jeff Layton 2016-11-13 13:34 ` Benjamin Coddington 2016-11-13 14:47 ` Trond Myklebust 0 siblings, 2 replies; 17+ messages in thread From: Jeff Layton @ 2016-11-13 2:56 UTC (permalink / raw) To: Benjamin Coddington; +Cc: List Linux NFS Mailing On Sat, 2016-11-12 at 16:16 -0500, Jeff Layton wrote: > On Sat, 2016-11-12 at 13:03 -0500, Benjamin Coddington wrote: > > > > On 12 Nov 2016, at 11:52, Jeff Layton wrote: > > > > > > > > On Sat, 2016-11-12 at 10:31 -0500, Benjamin Coddington wrote: > > > > > > > > On 12 Nov 2016, at 7:54, Jeff Layton wrote: > > > > > > > > > > > > > > > > > > > On Sat, 2016-11-12 at 06:08 -0500, Benjamin Coddington wrote: > > > > > > > > > > > > > > > > > > I've been seeing the following on a modified version of generic/089 > > > > > > that gets the client stuck sending LOCK with NFS4ERR_OLD_STATEID. > > > > > > > > > > > > 1. Client has open stateid A, sends a CLOSE > > > > > > 2. Client sends OPEN with same owner > > > > > > 3. Client sends another OPEN with same owner > > > > > > 4. Client gets a reply to OPEN in 3, stateid is B.2 (stateid B > > > > > > sequence 2) > > > > > > 5. Client does LOCK,LOCKU,FREE_STATEID from B.2 > > > > > > 6. Client gets a reply to CLOSE in 1 > > > > > > 7. Client gets reply to OPEN in 2, stateid is B.1 > > > > > > 8. Client sends LOCK with B.1 - OLD_STATEID, now stuck in a loop > > > > > > > > > > > > The CLOSE response in 6 causes us to clear NFS_OPEN_STATE, so that > > > > > > the OPEN > > > > > > response in 7 is able to update the open_stateid even though it has a > > > > > > lower > > > > > > sequence number. > > > > > > > > > > > > I think this case could be handled by never updating the open_stateid > > > > > > if the > > > > > > stateids match but the sequence number of the new state is less than > > > > > > the > > > > > > current open_state. > > > > > > > > > > > > > > > > What kernel is this on? > > > > > > > > On v4.9-rc2 with a couple fixups. Without them, I can't test long > > > > enough to > > > > reproduce this race. I don't think any of those are involved in this > > > > problem, though. > > > > > > > > > > > > > > > > > > > Yes, that seems wrong. The client should be picking B.2 for the open > > > > > stateid to use. I think that decision of whether to take a seqid is > > > > > made > > > > > in nfs_need_update_open_stateid. The logic in there looks correct to > > > > > me > > > > > at first glance though. > > > > > > > > nfs_need_update_open_stateid() will return true if NFS_OPEN_STATE is > > > > unset. > > > > That's the precondition set up by steps 1-6. Perhaps it should not > > > > update > > > > the stateid if they match but the sequence number is less, and still set > > > > NFS_OPEN_STATE once more. That will fix _this_ case. Are there other > > > > cases > > > > where that would be a problem? > > > > > > > > Ben > > > > > > That seems wrong. > > > > I'm not sure what you mean: what seems wrong? > > > > Sorry, it seems wrong that the client would issue the LOCK with B.1 > there. > > > > > > > The only close was sent in step 1, and that was for a > > > completely different stateid (A rather than B). It seems likely that > > > that is where the bug is. > > > > I'm still not sure what point you're trying to make.. > > > > Even though the close was sent in step 1, the response wasn't processed > > until step 6.. > > Not really a point per-se, I was just saying where I think the bug might > be... > > When you issue a CLOSE, you issue it vs. a particular stateid (stateid > "A" in this case). Once the open stateid has been superseded by "B", the > closing of "A" should have no effect. > > Perhaps nfs_clear_open_stateid needs to check and see whether the open > stateid has been superseded before doing its thing? > Ok, I see something that might be a problem in this call in nfs4_close_done: nfs_clear_open_stateid(state, &calldata->arg.stateid, res_stateid, calldata->arg.fmode); Note that we pass two nfs4_stateids to this call. The first is the stateid that got sent in the CLOSE call, and the second is the stateid that came back in the CLOSE response. RFC5661 and RFC7530 both indicate that the stateid in a CLOSE response should be ignored. So, I think a patch like this may be in order. As to whether it will fix this bug, I sort of doubt it, but it might not hurt to test it out? ----------------------8<-------------------------- [RFC PATCH] nfs: properly ignore the stateid in a CLOSE response Signed-off-by: Jeff Layton --- fs/nfs/nfs4proc.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 7897826d7c51..58413bd0aae2 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -1451,7 +1451,6 @@ static void nfs_resync_open_stateid_locked(struct nfs4_state *state) } static void nfs_clear_open_stateid_locked(struct nfs4_state *state, - nfs4_stateid *arg_stateid, nfs4_stateid *stateid, fmode_t fmode) { clear_bit(NFS_O_RDWR_STATE, &state->flags); @@ -1467,12 +1466,8 @@ static void nfs_clear_open_stateid_locked(struct nfs4_state *state, clear_bit(NFS_O_WRONLY_STATE, &state->flags); clear_bit(NFS_OPEN_STATE, &state->flags); } - if (stateid == NULL) - return; /* Handle races with OPEN */ - if (!nfs4_stateid_match_other(arg_stateid, &state->open_stateid) || - (nfs4_stateid_match_other(stateid, &state->open_stateid) && - !nfs4_stateid_is_newer(stateid, &state->open_stateid))) { + if (!nfs4_stateid_match_other(stateid, &state->open_stateid)) { nfs_resync_open_stateid_locked(state); return; } @@ -1482,11 +1477,10 @@ static void nfs_clear_open_stateid_locked(struct nfs4_state *state, } static void nfs_clear_open_stateid(struct nfs4_state *state, - nfs4_stateid *arg_stateid, nfs4_stateid *stateid, fmode_t fmode) { write_seqlock(&state->seqlock); - nfs_clear_open_stateid_locked(state, arg_stateid, stateid, fmode); + nfs_clear_open_stateid_locked(state, stateid, fmode); write_sequnlock(&state->seqlock); if (test_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags)) nfs4_schedule_state_manager(state->owner->so_server->nfs_client); @@ -3042,7 +3036,6 @@ static void nfs4_close_done(struct rpc_task *task, void *data) struct nfs4_closedata *calldata = data; struct nfs4_state *state = calldata->state; struct nfs_server *server = NFS_SERVER(calldata->inode); - nfs4_stateid *res_stateid = NULL; dprintk("%s: begin!\n", __func__); if (!nfs4_sequence_done(task, &calldata->res.seq_res)) @@ -3053,7 +3046,6 @@ static void nfs4_close_done(struct rpc_task *task, void *data) */ switch (task->tk_status) { case 0: - res_stateid = &calldata->res.stateid; if (calldata->roc) pnfs_roc_set_barrier(state->inode, calldata->roc_barrier); @@ -3081,7 +3073,7 @@ static void nfs4_close_done(struct rpc_task *task, void *data) } } nfs_clear_open_stateid(state, &calldata->arg.stateid, - res_stateid, calldata->arg.fmode); + calldata->arg.fmode); out_release: nfs_release_seqid(calldata->arg.seqid); nfs_refresh_inode(calldata->inode, calldata->res.fattr); -- 2.9.3 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: CLOSE/OPEN race 2016-11-13 2:56 ` Jeff Layton @ 2016-11-13 13:34 ` Benjamin Coddington 2016-11-13 14:22 ` Benjamin Coddington 2016-11-13 14:33 ` Jeff Layton 2016-11-13 14:47 ` Trond Myklebust 1 sibling, 2 replies; 17+ messages in thread From: Benjamin Coddington @ 2016-11-13 13:34 UTC (permalink / raw) To: Jeff Layton; +Cc: List Linux NFS Mailing On 12 Nov 2016, at 21:56, Jeff Layton wrote: > On Sat, 2016-11-12 at 16:16 -0500, Jeff Layton wrote: >> On Sat, 2016-11-12 at 13:03 -0500, Benjamin Coddington wrote: >>> >>> On 12 Nov 2016, at 11:52, Jeff Layton wrote: >>> >>>> >>>> On Sat, 2016-11-12 at 10:31 -0500, Benjamin Coddington wrote: >>>>> >>>>> On 12 Nov 2016, at 7:54, Jeff Layton wrote: >>>>> >>>>>> >>>>>> >>>>>> On Sat, 2016-11-12 at 06:08 -0500, Benjamin Coddington wrote: >>>>>>> >>>>>>> >>>>>>> I've been seeing the following on a modified version of >>>>>>> generic/089 >>>>>>> that gets the client stuck sending LOCK with >>>>>>> NFS4ERR_OLD_STATEID. >>>>>>> >>>>>>> 1. Client has open stateid A, sends a CLOSE >>>>>>> 2. Client sends OPEN with same owner >>>>>>> 3. Client sends another OPEN with same owner >>>>>>> 4. Client gets a reply to OPEN in 3, stateid is B.2 (stateid B >>>>>>> sequence 2) >>>>>>> 5. Client does LOCK,LOCKU,FREE_STATEID from B.2 >>>>>>> 6. Client gets a reply to CLOSE in 1 >>>>>>> 7. Client gets reply to OPEN in 2, stateid is B.1 >>>>>>> 8. Client sends LOCK with B.1 - OLD_STATEID, now stuck in a loop >>>>>>> >>>>>>> The CLOSE response in 6 causes us to clear NFS_OPEN_STATE, so >>>>>>> that >>>>>>> the OPEN >>>>>>> response in 7 is able to update the open_stateid even though it >>>>>>> has a >>>>>>> lower >>>>>>> sequence number. >>>>>>> >>>>>>> I think this case could be handled by never updating the >>>>>>> open_stateid >>>>>>> if the >>>>>>> stateids match but the sequence number of the new state is less >>>>>>> than >>>>>>> the >>>>>>> current open_state. >>>>>>> >>>>>> >>>>>> What kernel is this on? >>>>> >>>>> On v4.9-rc2 with a couple fixups. Without them, I can't test long >>>>> enough to >>>>> reproduce this race. I don't think any of those are involved in >>>>> this >>>>> problem, though. >>>>> >>>>>> >>>>>> >>>>>> Yes, that seems wrong. The client should be picking B.2 for the >>>>>> open >>>>>> stateid to use. I think that decision of whether to take a seqid >>>>>> is >>>>>> made >>>>>> in nfs_need_update_open_stateid. The logic in there looks >>>>>> correct to >>>>>> me >>>>>> at first glance though. >>>>> >>>>> nfs_need_update_open_stateid() will return true if NFS_OPEN_STATE >>>>> is >>>>> unset. >>>>> That's the precondition set up by steps 1-6. Perhaps it should >>>>> not >>>>> update >>>>> the stateid if they match but the sequence number is less, and >>>>> still set >>>>> NFS_OPEN_STATE once more. That will fix _this_ case. Are there >>>>> other >>>>> cases >>>>> where that would be a problem? >>>>> >>>>> Ben >>>> >>>> That seems wrong. >>> >>> I'm not sure what you mean: what seems wrong? >>> >> >> Sorry, it seems wrong that the client would issue the LOCK with B.1 >> there. >> >>>> >>>> The only close was sent in step 1, and that was for a >>>> completely different stateid (A rather than B). It seems likely >>>> that >>>> that is where the bug is. >>> >>> I'm still not sure what point you're trying to make.. >>> >>> Even though the close was sent in step 1, the response wasn't >>> processed >>> until step 6.. >> >> Not really a point per-se, I was just saying where I think the bug >> might >> be... >> >> When you issue a CLOSE, you issue it vs. a particular stateid >> (stateid >> "A" in this case). Once the open stateid has been superseded by "B", >> the >> closing of "A" should have no effect. >> >> Perhaps nfs_clear_open_stateid needs to check and see whether the >> open >> stateid has been superseded before doing its thing? >> > > Ok, I see something that might be a problem in this call in > nfs4_close_done: > > nfs_clear_open_stateid(state, &calldata->arg.stateid, > res_stateid, > calldata->arg.fmode); > > Note that we pass two nfs4_stateids to this call. The first is the > stateid that got sent in the CLOSE call, and the second is the stateid > that came back in the CLOSE response. > > RFC5661 and RFC7530 both indicate that the stateid in a CLOSE response > should be ignored. > > So, I think a patch like this may be in order. As to whether it will > fix this bug, I sort of doubt it, but it might not hurt to test it > out? Doesn't this undo the fix in 4a1e2feb9d24 ("NFSv4.1: Fix a protocol issue with CLOSE stateids")? I don't think it will help with the race either, since I think the issue is that the race above unsets NFS_OPEN_STATE, then that is reason enough to overwrite the existing open_state, even if it is going to overwrite with an older stateid. What close does with its stateid is unimportant to that problem. In the interest of science, I'll kick off a test with this patch. It should take a few hours to reproduce, but I'm also fairly busy today, so I'll report back this evening most likely. Thanks for the patch. Ben > ----------------------8<-------------------------- > > [RFC PATCH] nfs: properly ignore the stateid in a CLOSE response > > Signed-off-by: Jeff Layton > --- > fs/nfs/nfs4proc.c | 14 +++----------- > 1 file changed, 3 insertions(+), 11 deletions(-) > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > index 7897826d7c51..58413bd0aae2 100644 > --- a/fs/nfs/nfs4proc.c > +++ b/fs/nfs/nfs4proc.c > @@ -1451,7 +1451,6 @@ static void > nfs_resync_open_stateid_locked(struct nfs4_state *state) > } > > static void nfs_clear_open_stateid_locked(struct nfs4_state *state, > - nfs4_stateid *arg_stateid, > nfs4_stateid *stateid, fmode_t fmode) > { > clear_bit(NFS_O_RDWR_STATE, &state->flags); > @@ -1467,12 +1466,8 @@ static void > nfs_clear_open_stateid_locked(struct nfs4_state *state, > clear_bit(NFS_O_WRONLY_STATE, &state->flags); > clear_bit(NFS_OPEN_STATE, &state->flags); > } > - if (stateid == NULL) > - return; > /* Handle races with OPEN */ > - if (!nfs4_stateid_match_other(arg_stateid, &state->open_stateid) || > - (nfs4_stateid_match_other(stateid, &state->open_stateid) && > - !nfs4_stateid_is_newer(stateid, &state->open_stateid))) { > + if (!nfs4_stateid_match_other(stateid, &state->open_stateid)) { > nfs_resync_open_stateid_locked(state); > return; > } > @@ -1482,11 +1477,10 @@ static void > nfs_clear_open_stateid_locked(struct nfs4_state *state, > } > > static void nfs_clear_open_stateid(struct nfs4_state *state, > - nfs4_stateid *arg_stateid, > nfs4_stateid *stateid, fmode_t fmode) > { > write_seqlock(&state->seqlock); > - nfs_clear_open_stateid_locked(state, arg_stateid, stateid, fmode); > + nfs_clear_open_stateid_locked(state, stateid, fmode); > write_sequnlock(&state->seqlock); > if (test_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags)) > nfs4_schedule_state_manager(state->owner->so_server->nfs_client); > @@ -3042,7 +3036,6 @@ static void nfs4_close_done(struct rpc_task > *task, void *data) > struct nfs4_closedata *calldata = data; > struct nfs4_state *state = calldata->state; > struct nfs_server *server = NFS_SERVER(calldata->inode); > - nfs4_stateid *res_stateid = NULL; > > dprintk("%s: begin!\n", __func__); > if (!nfs4_sequence_done(task, &calldata->res.seq_res)) > @@ -3053,7 +3046,6 @@ static void nfs4_close_done(struct rpc_task > *task, void *data) > */ > switch (task->tk_status) { > case 0: > - res_stateid = &calldata->res.stateid; > if (calldata->roc) > pnfs_roc_set_barrier(state->inode, > calldata->roc_barrier); > @@ -3081,7 +3073,7 @@ static void nfs4_close_done(struct rpc_task > *task, void *data) > } > } > nfs_clear_open_stateid(state, &calldata->arg.stateid, > - res_stateid, calldata->arg.fmode); > + calldata->arg.fmode); > out_release: > nfs_release_seqid(calldata->arg.seqid); > nfs_refresh_inode(calldata->inode, calldata->res.fattr); > -- > 2.9.3 > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: CLOSE/OPEN race 2016-11-13 13:34 ` Benjamin Coddington @ 2016-11-13 14:22 ` Benjamin Coddington 2016-11-13 14:33 ` Jeff Layton 1 sibling, 0 replies; 17+ messages in thread From: Benjamin Coddington @ 2016-11-13 14:22 UTC (permalink / raw) To: Jeff Layton; +Cc: List Linux NFS Mailing On 13 Nov 2016, at 8:34, Benjamin Coddington wrote: > > On 12 Nov 2016, at 21:56, Jeff Layton wrote: > >> On Sat, 2016-11-12 at 16:16 -0500, Jeff Layton wrote: >>> On Sat, 2016-11-12 at 13:03 -0500, Benjamin Coddington wrote: >>>> >>>> On 12 Nov 2016, at 11:52, Jeff Layton wrote: >>>> >>>>> >>>>> On Sat, 2016-11-12 at 10:31 -0500, Benjamin Coddington wrote: >>>>>> >>>>>> On 12 Nov 2016, at 7:54, Jeff Layton wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, 2016-11-12 at 06:08 -0500, Benjamin Coddington wrote: >>>>>>>> >>>>>>>> >>>>>>>> I've been seeing the following on a modified version of >>>>>>>> generic/089 >>>>>>>> that gets the client stuck sending LOCK with >>>>>>>> NFS4ERR_OLD_STATEID. >>>>>>>> >>>>>>>> 1. Client has open stateid A, sends a CLOSE >>>>>>>> 2. Client sends OPEN with same owner >>>>>>>> 3. Client sends another OPEN with same owner >>>>>>>> 4. Client gets a reply to OPEN in 3, stateid is B.2 (stateid B >>>>>>>> sequence 2) >>>>>>>> 5. Client does LOCK,LOCKU,FREE_STATEID from B.2 >>>>>>>> 6. Client gets a reply to CLOSE in 1 >>>>>>>> 7. Client gets reply to OPEN in 2, stateid is B.1 >>>>>>>> 8. Client sends LOCK with B.1 - OLD_STATEID, now stuck in a >>>>>>>> loop >>>>>>>> >>>>>>>> The CLOSE response in 6 causes us to clear NFS_OPEN_STATE, so >>>>>>>> that >>>>>>>> the OPEN >>>>>>>> response in 7 is able to update the open_stateid even though it >>>>>>>> has a >>>>>>>> lower >>>>>>>> sequence number. >>>>>>>> >>>>>>>> I think this case could be handled by never updating the >>>>>>>> open_stateid >>>>>>>> if the >>>>>>>> stateids match but the sequence number of the new state is less >>>>>>>> than >>>>>>>> the >>>>>>>> current open_state. >>>>>>>> >>>>>>> >>>>>>> What kernel is this on? >>>>>> >>>>>> On v4.9-rc2 with a couple fixups. Without them, I can't test >>>>>> long >>>>>> enough to >>>>>> reproduce this race. I don't think any of those are involved in >>>>>> this >>>>>> problem, though. >>>>>> >>>>>>> >>>>>>> >>>>>>> Yes, that seems wrong. The client should be picking B.2 for the >>>>>>> open >>>>>>> stateid to use. I think that decision of whether to take a seqid >>>>>>> is >>>>>>> made >>>>>>> in nfs_need_update_open_stateid. The logic in there looks >>>>>>> correct to >>>>>>> me >>>>>>> at first glance though. >>>>>> >>>>>> nfs_need_update_open_stateid() will return true if NFS_OPEN_STATE >>>>>> is >>>>>> unset. >>>>>> That's the precondition set up by steps 1-6. Perhaps it should >>>>>> not >>>>>> update >>>>>> the stateid if they match but the sequence number is less, and >>>>>> still set >>>>>> NFS_OPEN_STATE once more. That will fix _this_ case. Are there >>>>>> other >>>>>> cases >>>>>> where that would be a problem? >>>>>> >>>>>> Ben >>>>> >>>>> That seems wrong. >>>> >>>> I'm not sure what you mean: what seems wrong? >>>> >>> >>> Sorry, it seems wrong that the client would issue the LOCK with B.1 >>> there. >>> >>>>> >>>>> The only close was sent in step 1, and that was for a >>>>> completely different stateid (A rather than B). It seems likely >>>>> that >>>>> that is where the bug is. >>>> >>>> I'm still not sure what point you're trying to make.. >>>> >>>> Even though the close was sent in step 1, the response wasn't >>>> processed >>>> until step 6.. >>> >>> Not really a point per-se, I was just saying where I think the bug >>> might >>> be... >>> >>> When you issue a CLOSE, you issue it vs. a particular stateid >>> (stateid >>> "A" in this case). Once the open stateid has been superseded by "B", >>> the >>> closing of "A" should have no effect. >>> >>> Perhaps nfs_clear_open_stateid needs to check and see whether the >>> open >>> stateid has been superseded before doing its thing? >>> >> >> Ok, I see something that might be a problem in this call in >> nfs4_close_done: >> >> nfs_clear_open_stateid(state, &calldata->arg.stateid, >> res_stateid, >> calldata->arg.fmode); >> >> Note that we pass two nfs4_stateids to this call. The first is the >> stateid that got sent in the CLOSE call, and the second is the >> stateid >> that came back in the CLOSE response. >> >> RFC5661 and RFC7530 both indicate that the stateid in a CLOSE >> response >> should be ignored. >> >> So, I think a patch like this may be in order. As to whether it will >> fix this bug, I sort of doubt it, but it might not hurt to test it >> out? > > Doesn't this undo the fix in 4a1e2feb9d24 ("NFSv4.1: Fix a protocol > issue > with CLOSE stateids")? > > I don't think it will help with the race either, since I think the > issue is > that the race above unsets NFS_OPEN_STATE, then that is reason enough > to > overwrite the existing open_state, even if it is going to overwrite > with an > older stateid. What close does with its stateid is unimportant to > that > problem. > > In the interest of science, I'll kick off a test with this patch. It > should > take a few hours to reproduce, but I'm also fairly busy today, so I'll > report back this evening most likely. Thanks for the patch. I got lucky and it reproduced in about 10 minutes with this patch. Ben ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: CLOSE/OPEN race 2016-11-13 13:34 ` Benjamin Coddington 2016-11-13 14:22 ` Benjamin Coddington @ 2016-11-13 14:33 ` Jeff Layton 1 sibling, 0 replies; 17+ messages in thread From: Jeff Layton @ 2016-11-13 14:33 UTC (permalink / raw) To: Benjamin Coddington; +Cc: List Linux NFS Mailing On Sun, 2016-11-13 at 08:34 -0500, Benjamin Coddington wrote: > > On 12 Nov 2016, at 21:56, Jeff Layton wrote: > > > On Sat, 2016-11-12 at 16:16 -0500, Jeff Layton wrote: > > > On Sat, 2016-11-12 at 13:03 -0500, Benjamin Coddington wrote: > > > > > > > > On 12 Nov 2016, at 11:52, Jeff Layton wrote: > > > > > > > > > > > > > > On Sat, 2016-11-12 at 10:31 -0500, Benjamin Coddington wrote: > > > > > > > > > > > > On 12 Nov 2016, at 7:54, Jeff Layton wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, 2016-11-12 at 06:08 -0500, Benjamin Coddington wrote: > > > > > > > > > > > > > > > > > > > > > > > > I've been seeing the following on a modified version of > > > > > > > > generic/089 > > > > > > > > that gets the client stuck sending LOCK with > > > > > > > > NFS4ERR_OLD_STATEID. > > > > > > > > > > > > > > > > 1. Client has open stateid A, sends a CLOSE > > > > > > > > 2. Client sends OPEN with same owner > > > > > > > > 3. Client sends another OPEN with same owner > > > > > > > > 4. Client gets a reply to OPEN in 3, stateid is B.2 (stateid B > > > > > > > > sequence 2) > > > > > > > > 5. Client does LOCK,LOCKU,FREE_STATEID from B.2 > > > > > > > > 6. Client gets a reply to CLOSE in 1 > > > > > > > > 7. Client gets reply to OPEN in 2, stateid is B.1 > > > > > > > > 8. Client sends LOCK with B.1 - OLD_STATEID, now stuck in a loop > > > > > > > > > > > > > > > > The CLOSE response in 6 causes us to clear NFS_OPEN_STATE, so > > > > > > > > that > > > > > > > > the OPEN > > > > > > > > response in 7 is able to update the open_stateid even though it > > > > > > > > has a > > > > > > > > lower > > > > > > > > sequence number. > > > > > > > > > > > > > > > > I think this case could be handled by never updating the > > > > > > > > open_stateid > > > > > > > > if the > > > > > > > > stateids match but the sequence number of the new state is less > > > > > > > > than > > > > > > > > the > > > > > > > > current open_state. > > > > > > > > > > > > > > > > > > > > > > What kernel is this on? > > > > > > > > > > > > On v4.9-rc2 with a couple fixups. Without them, I can't test long > > > > > > enough to > > > > > > reproduce this race. I don't think any of those are involved in > > > > > > this > > > > > > problem, though. > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, that seems wrong. The client should be picking B.2 for the > > > > > > > open > > > > > > > stateid to use. I think that decision of whether to take a seqid > > > > > > > is > > > > > > > made > > > > > > > in nfs_need_update_open_stateid. The logic in there looks > > > > > > > correct to > > > > > > > me > > > > > > > at first glance though. > > > > > > > > > > > > nfs_need_update_open_stateid() will return true if NFS_OPEN_STATE > > > > > > is > > > > > > unset. > > > > > > That's the precondition set up by steps 1-6. Perhaps it should > > > > > > not > > > > > > update > > > > > > the stateid if they match but the sequence number is less, and > > > > > > still set > > > > > > NFS_OPEN_STATE once more. That will fix _this_ case. Are there > > > > > > other > > > > > > cases > > > > > > where that would be a problem? > > > > > > > > > > > > Ben > > > > > > > > > > That seems wrong. > > > > > > > > I'm not sure what you mean: what seems wrong? > > > > > > > > > > Sorry, it seems wrong that the client would issue the LOCK with B.1 > > > there. > > > > > > > > > > > > > The only close was sent in step 1, and that was for a > > > > > completely different stateid (A rather than B). It seems likely > > > > > that > > > > > that is where the bug is. > > > > > > > > I'm still not sure what point you're trying to make.. > > > > > > > > Even though the close was sent in step 1, the response wasn't > > > > processed > > > > until step 6.. > > > > > > Not really a point per-se, I was just saying where I think the bug > > > might > > > be... > > > > > > When you issue a CLOSE, you issue it vs. a particular stateid > > > (stateid > > > "A" in this case). Once the open stateid has been superseded by "B", > > > the > > > closing of "A" should have no effect. > > > > > > Perhaps nfs_clear_open_stateid needs to check and see whether the > > > open > > > stateid has been superseded before doing its thing? > > > > > > > Ok, I see something that might be a problem in this call in > > nfs4_close_done: > > > > nfs_clear_open_stateid(state, &calldata->arg.stateid, > > res_stateid, > > calldata->arg.fmode); > > > > Note that we pass two nfs4_stateids to this call. The first is the > > stateid that got sent in the CLOSE call, and the second is the stateid > > that came back in the CLOSE response. > > > > RFC5661 and RFC7530 both indicate that the stateid in a CLOSE response > > should be ignored. > > > > So, I think a patch like this may be in order. As to whether it will > > fix this bug, I sort of doubt it, but it might not hurt to test it > > out? > > Doesn't this undo the fix in 4a1e2feb9d24 ("NFSv4.1: Fix a protocol > issue > with CLOSE stateids")? > I don't think so. The (updated) specs are pretty clear that the stateid in a CLOSE response is not to be trusted in any way. I think we're better off just relying on the stateid that was sent in the request. On a related note, we probably need to fix knfsd not to send anything in there as well, but we probably need to think about older clients there as well. Thanks for testing the patch. I'm not too surprised this didn't help here though. It didn't really change the logic appreciably, unless you have a server sending weird stateids in the CLOSE response. > I don't think it will help with the race either, since I think the issue > is > that the race above unsets NFS_OPEN_STATE, then that is reason enough to > overwrite the existing open_state, even if it is going to overwrite with > an > older stateid. What close does with its stateid is unimportant to that > problem. > > In the interest of science, I'll kick off a test with this patch. It > should > take a few hours to reproduce, but I'm also fairly busy today, so I'll > report back this evening most likely. Thanks for the patch. > > Ben > > > ----------------------8<-------------------------- > > > > [RFC PATCH] nfs: properly ignore the stateid in a CLOSE response > > > > Signed-off-by: Jeff Layton > > --- > > fs/nfs/nfs4proc.c | 14 +++----------- > > 1 file changed, 3 insertions(+), 11 deletions(-) > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > > index 7897826d7c51..58413bd0aae2 100644 > > --- a/fs/nfs/nfs4proc.c > > +++ b/fs/nfs/nfs4proc.c > > @@ -1451,7 +1451,6 @@ static void > > nfs_resync_open_stateid_locked(struct nfs4_state *state) > > } > > > > static void nfs_clear_open_stateid_locked(struct nfs4_state *state, > > > > - nfs4_stateid *arg_stateid, > > > > nfs4_stateid *stateid, fmode_t fmode) > > { > > > > clear_bit(NFS_O_RDWR_STATE, &state->flags); > > @@ -1467,12 +1466,8 @@ static void > > nfs_clear_open_stateid_locked(struct nfs4_state *state, > > > > clear_bit(NFS_O_WRONLY_STATE, &state->flags); > > > > clear_bit(NFS_OPEN_STATE, &state->flags); > > > > } > > > > - if (stateid == NULL) > > > > - return; > > > > /* Handle races with OPEN */ > > > > - if (!nfs4_stateid_match_other(arg_stateid, &state->open_stateid) || > > > > - (nfs4_stateid_match_other(stateid, &state->open_stateid) && > > > > - !nfs4_stateid_is_newer(stateid, &state->open_stateid))) { > > > > + if (!nfs4_stateid_match_other(stateid, &state->open_stateid)) { > > > > nfs_resync_open_stateid_locked(state); > > > > return; > > > > } > > @@ -1482,11 +1477,10 @@ static void > > nfs_clear_open_stateid_locked(struct nfs4_state *state, > > } > > > > static void nfs_clear_open_stateid(struct nfs4_state *state, > > > > - nfs4_stateid *arg_stateid, > > > > nfs4_stateid *stateid, fmode_t fmode) > > { > > > > write_seqlock(&state->seqlock); > > > > - nfs_clear_open_stateid_locked(state, arg_stateid, stateid, fmode); > > > > + nfs_clear_open_stateid_locked(state, stateid, fmode); > > > > write_sequnlock(&state->seqlock); > > > > if (test_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags)) > > > > nfs4_schedule_state_manager(state->owner->so_server->nfs_client); > > @@ -3042,7 +3036,6 @@ static void nfs4_close_done(struct rpc_task > > *task, void *data) > > > > struct nfs4_closedata *calldata = data; > > > > struct nfs4_state *state = calldata->state; > > > > struct nfs_server *server = NFS_SERVER(calldata->inode); > > > > - nfs4_stateid *res_stateid = NULL; > > > > > > dprintk("%s: begin!\n", __func__); > > > > if (!nfs4_sequence_done(task, &calldata->res.seq_res)) > > @@ -3053,7 +3046,6 @@ static void nfs4_close_done(struct rpc_task > > *task, void *data) > > > > */ > > > > switch (task->tk_status) { > > > > case 0: > > > > - res_stateid = &calldata->res.stateid; > > > > if (calldata->roc) > > > > pnfs_roc_set_barrier(state->inode, > > > > calldata->roc_barrier); > > @@ -3081,7 +3073,7 @@ static void nfs4_close_done(struct rpc_task > > *task, void *data) > > > > } > > > > } > > > > nfs_clear_open_stateid(state, &calldata->arg.stateid, > > > > - res_stateid, calldata->arg.fmode); > > > > + calldata->arg.fmode); > > out_release: > > > > nfs_release_seqid(calldata->arg.seqid); > > > > nfs_refresh_inode(calldata->inode, calldata->res.fattr); > > -- > > 2.9.3 > > -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: CLOSE/OPEN race 2016-11-13 2:56 ` Jeff Layton 2016-11-13 13:34 ` Benjamin Coddington @ 2016-11-13 14:47 ` Trond Myklebust 2016-11-14 14:53 ` Benjamin Coddington 1 sibling, 1 reply; 17+ messages in thread From: Trond Myklebust @ 2016-11-13 14:47 UTC (permalink / raw) To: bcodding, jlayton; +Cc: linux-nfs T24gU2F0LCAyMDE2LTExLTEyIGF0IDIxOjU2IC0wNTAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g T24gU2F0LCAyMDE2LTExLTEyIGF0IDE2OjE2IC0wNTAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g PiANCj4gPiBPbiBTYXQsIDIwMTYtMTEtMTIgYXQgMTM6MDMgLTA1MDAsIEJlbmphbWluIENvZGRp bmd0b24gd3JvdGU6DQo+ID4gPiANCj4gPiA+IA0KPiA+ID4gT24gMTIgTm92IDIwMTYsIGF0IDEx OjUyLCBKZWZmIExheXRvbiB3cm90ZToNCj4gPiA+IA0KPiA+ID4gPiANCj4gPiA+ID4gDQo+ID4g PiA+IE9uIFNhdCwgMjAxNi0xMS0xMiBhdCAxMDozMSAtMDUwMCwgQmVuamFtaW4gQ29kZGluZ3Rv biB3cm90ZToNCj4gPiA+ID4gPiANCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBPbiAxMiBOb3YgMjAx NiwgYXQgNzo1NCwgSmVmZiBMYXl0b24gd3JvdGU6DQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiAN Cj4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiBPbiBTYXQsIDIwMTYtMTEt MTIgYXQgMDY6MDggLTA1MDAsIEJlbmphbWluIENvZGRpbmd0b24NCj4gPiA+ID4gPiA+IHdyb3Rl Og0KPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+IA0KPiA+ID4g PiA+ID4gPiBJJ3ZlIGJlZW4gc2VlaW5nIHRoZSBmb2xsb3dpbmcgb24gYSBtb2RpZmllZCB2ZXJz aW9uIG9mDQo+ID4gPiA+ID4gPiA+IGdlbmVyaWMvMDg5DQo+ID4gPiA+ID4gPiA+IHRoYXQgZ2V0 cyB0aGUgY2xpZW50IHN0dWNrIHNlbmRpbmcgTE9DSyB3aXRoDQo+ID4gPiA+ID4gPiA+IE5GUzRF UlJfT0xEX1NUQVRFSUQuDQo+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiAxLiBDbGllbnQg aGFzIG9wZW4gc3RhdGVpZCBBLCBzZW5kcyBhIENMT1NFDQo+ID4gPiA+ID4gPiA+IDIuIENsaWVu dCBzZW5kcyBPUEVOIHdpdGggc2FtZSBvd25lcg0KPiA+ID4gPiA+ID4gPiAzLiBDbGllbnQgc2Vu ZHMgYW5vdGhlciBPUEVOIHdpdGggc2FtZSBvd25lcg0KPiA+ID4gPiA+ID4gPiA0LiBDbGllbnQg Z2V0cyBhIHJlcGx5IHRvIE9QRU4gaW4gMywgc3RhdGVpZCBpcyBCLjINCj4gPiA+ID4gPiA+ID4g KHN0YXRlaWQgQg0KPiA+ID4gPiA+ID4gPiBzZXF1ZW5jZSAyKQ0KPiA+ID4gPiA+ID4gPiA1LiBD bGllbnQgZG9lcyBMT0NLLExPQ0tVLEZSRUVfU1RBVEVJRCBmcm9tIEIuMg0KPiA+ID4gPiA+ID4g PiA2LiBDbGllbnQgZ2V0cyBhIHJlcGx5IHRvIENMT1NFIGluIDENCj4gPiA+ID4gPiA+ID4gNy4g Q2xpZW50IGdldHMgcmVwbHkgdG8gT1BFTiBpbiAyLCBzdGF0ZWlkIGlzIEIuMQ0KPiA+ID4gPiA+ ID4gPiA4LiBDbGllbnQgc2VuZHMgTE9DSyB3aXRoIEIuMSAtIE9MRF9TVEFURUlELCBub3cgc3R1 Y2sgaW4NCj4gPiA+ID4gPiA+ID4gYSBsb29wDQo+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4g PiBUaGUgQ0xPU0UgcmVzcG9uc2UgaW4gNiBjYXVzZXMgdXMgdG8gY2xlYXINCj4gPiA+ID4gPiA+ ID4gTkZTX09QRU5fU1RBVEUsIHNvIHRoYXQNCj4gPiA+ID4gPiA+ID4gdGhlIE9QRU4NCj4gPiA+ ID4gPiA+ID4gcmVzcG9uc2UgaW4gNyBpcyBhYmxlIHRvIHVwZGF0ZSB0aGUgb3Blbl9zdGF0ZWlk IGV2ZW4NCj4gPiA+ID4gPiA+ID4gdGhvdWdoIGl0IGhhcyBhDQo+ID4gPiA+ID4gPiA+IGxvd2Vy DQo+ID4gPiA+ID4gPiA+IHNlcXVlbmNlIG51bWJlci4NCj4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ ID4gPiA+IEkgdGhpbmsgdGhpcyBjYXNlIGNvdWxkIGJlIGhhbmRsZWQgYnkgbmV2ZXIgdXBkYXRp bmcgdGhlDQo+ID4gPiA+ID4gPiA+IG9wZW5fc3RhdGVpZA0KPiA+ID4gPiA+ID4gPiBpZiB0aGUN Cj4gPiA+ID4gPiA+ID4gc3RhdGVpZHMgbWF0Y2ggYnV0IHRoZSBzZXF1ZW5jZSBudW1iZXIgb2Yg dGhlIG5ldyBzdGF0ZQ0KPiA+ID4gPiA+ID4gPiBpcyBsZXNzIHRoYW4NCj4gPiA+ID4gPiA+ID4g dGhlDQo+ID4gPiA+ID4gPiA+IGN1cnJlbnQgb3Blbl9zdGF0ZS4NCj4gPiA+ID4gPiA+ID4gDQo+ ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+IFdoYXQga2VybmVsIGlzIHRoaXMgb24/DQo+ID4gPiA+ ID4gDQo+ID4gPiA+ID4gT24gdjQuOS1yYzIgd2l0aCBhIGNvdXBsZSBmaXh1cHMuwqDCoFdpdGhv dXQgdGhlbSwgSSBjYW4ndCB0ZXN0DQo+ID4gPiA+ID4gbG9uZw0KPiA+ID4gPiA+IGVub3VnaCB0 bw0KPiA+ID4gPiA+IHJlcHJvZHVjZSB0aGlzIHJhY2UuwqDCoEkgZG9uJ3QgdGhpbmsgYW55IG9m IHRob3NlIGFyZSBpbnZvbHZlZA0KPiA+ID4gPiA+IGluIHRoaXMNCj4gPiA+ID4gPiBwcm9ibGVt LCB0aG91Z2guDQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+IA0KPiA+ID4g PiA+ID4gDQo+ID4gPiA+ID4gPiBZZXMsIHRoYXQgc2VlbXMgd3JvbmcuIFRoZSBjbGllbnQgc2hv dWxkIGJlIHBpY2tpbmcgQi4yIGZvcg0KPiA+ID4gPiA+ID4gdGhlIG9wZW4NCj4gPiA+ID4gPiA+ IHN0YXRlaWQgdG8gdXNlLiBJIHRoaW5rIHRoYXQgZGVjaXNpb24gb2Ygd2hldGhlciB0byB0YWtl IGENCj4gPiA+ID4gPiA+IHNlcWlkIGlzDQo+ID4gPiA+ID4gPiBtYWRlDQo+ID4gPiA+ID4gPiBp bsKgbmZzX25lZWRfdXBkYXRlX29wZW5fc3RhdGVpZC4gVGhlIGxvZ2ljIGluIHRoZXJlIGxvb2tz DQo+ID4gPiA+ID4gPiBjb3JyZWN0IHRvDQo+ID4gPiA+ID4gPiBtZQ0KPiA+ID4gPiA+ID4gYXQg Zmlyc3QgZ2xhbmNlIHRob3VnaC4NCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBuZnNfbmVlZF91cGRh dGVfb3Blbl9zdGF0ZWlkKCkgd2lsbCByZXR1cm4gdHJ1ZSBpZg0KPiA+ID4gPiA+IE5GU19PUEVO X1NUQVRFIGlzDQo+ID4gPiA+ID4gdW5zZXQuDQo+ID4gPiA+ID4gVGhhdCdzIHRoZSBwcmVjb25k aXRpb24gc2V0IHVwIGJ5IHN0ZXBzIDEtNi7CoMKgUGVyaGFwcyBpdA0KPiA+ID4gPiA+IHNob3Vs ZCBub3QNCj4gPiA+ID4gPiB1cGRhdGUNCj4gPiA+ID4gPiB0aGUgc3RhdGVpZCBpZiB0aGV5IG1h dGNoIGJ1dCB0aGUgc2VxdWVuY2UgbnVtYmVyIGlzIGxlc3MsDQo+ID4gPiA+ID4gYW5kIHN0aWxs IHNldA0KPiA+ID4gPiA+IE5GU19PUEVOX1NUQVRFIG9uY2UgbW9yZS7CoMKgVGhhdCB3aWxsIGZp eCBfdGhpc18gY2FzZS7CoMKgQXJlDQo+ID4gPiA+ID4gdGhlcmUgb3RoZXINCj4gPiA+ID4gPiBj YXNlcw0KPiA+ID4gPiA+IHdoZXJlIHRoYXQgd291bGQgYmUgYSBwcm9ibGVtPw0KPiA+ID4gPiA+ IA0KPiA+ID4gPiA+IEJlbg0KPiA+ID4gPiANCj4gPiA+ID4gVGhhdCBzZWVtcyB3cm9uZy4NCj4g PiA+IA0KPiA+ID4gSSdtIG5vdCBzdXJlIHdoYXQgeW91IG1lYW46IHdoYXQgc2VlbXMgd3Jvbmc/ DQo+ID4gPiANCj4gPiANCj4gPiBTb3JyeSwgaXQgc2VlbXMgd3JvbmcgdGhhdCB0aGUgY2xpZW50 IHdvdWxkIGlzc3VlIHRoZSBMT0NLIHdpdGggQi4xDQo+ID4gdGhlcmUuDQo+ID4gDQo+ID4gPiAN Cj4gPiA+ID4gDQo+ID4gPiA+IA0KPiA+ID4gPiBUaGUgb25seSBjbG9zZSB3YXMgc2VudCBpbiBz dGVwIDEsIGFuZCB0aGF0IHdhcyBmb3IgYQ0KPiA+ID4gPiBjb21wbGV0ZWx5IGRpZmZlcmVudCBz dGF0ZWlkIChBIHJhdGhlciB0aGFuIEIpLiBJdCBzZWVtcyBsaWtlbHkNCj4gPiA+ID4gdGhhdA0K PiA+ID4gPiB0aGF0IGlzIHdoZXJlIHRoZSBidWcgaXMuDQo+ID4gPiANCj4gPiA+IEknbSBzdGls bCBub3Qgc3VyZSB3aGF0IHBvaW50IHlvdSdyZSB0cnlpbmcgdG8gbWFrZS4uDQo+ID4gPiANCj4g PiA+IEV2ZW4gdGhvdWdoIHRoZSBjbG9zZSB3YXMgc2VudCBpbiBzdGVwIDEsIHRoZSByZXNwb25z ZSB3YXNuJ3QNCj4gPiA+IHByb2Nlc3NlZA0KPiA+ID4gdW50aWwgc3RlcCA2Li4NCj4gPiANCj4g PiBOb3QgcmVhbGx5IGEgcG9pbnQgcGVyLXNlLCBJIHdhcyBqdXN0IHNheWluZyB3aGVyZSBJIHRo aW5rIHRoZSBidWcNCj4gPiBtaWdodA0KPiA+IGJlLi4uDQo+ID4gDQo+ID4gV2hlbiB5b3UgaXNz dWUgYSBDTE9TRSwgeW91IGlzc3VlIGl0IHZzLiBhIHBhcnRpY3VsYXIgc3RhdGVpZA0KPiA+IChz dGF0ZWlkDQo+ID4gIkEiIGluIHRoaXMgY2FzZSkuIE9uY2UgdGhlIG9wZW4gc3RhdGVpZCBoYXMg YmVlbiBzdXBlcnNlZGVkIGJ5DQo+ID4gIkIiLCB0aGUNCj4gPiBjbG9zaW5nIG9mICJBIiBzaG91 bGQgaGF2ZSBubyBlZmZlY3QuDQo+ID4gDQo+ID4gUGVyaGFwc8KgbmZzX2NsZWFyX29wZW5fc3Rh dGVpZCBuZWVkcyB0byBjaGVjayBhbmQgc2VlIHdoZXRoZXIgdGhlDQo+ID4gb3Blbg0KPiA+IHN0 YXRlaWQgaGFzIGJlZW4gc3VwZXJzZWRlZCBiZWZvcmUgZG9pbmcgaXRzIHRoaW5nPw0KPiA+IA0K PiANCj4gT2ssIEkgc2VlIHNvbWV0aGluZyB0aGF0IG1pZ2h0IGJlIGEgcHJvYmxlbSBpbiB0aGlz IGNhbGwgaW4NCj4gbmZzNF9jbG9zZV9kb25lOg0KPiANCj4gwqDCoMKgwqDCoMKgwqBuZnNfY2xl YXJfb3Blbl9zdGF0ZWlkKHN0YXRlLCAmY2FsbGRhdGEtPmFyZy5zdGF0ZWlkLA0KPiDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqByZXNfc3RhdGVpZCwgY2Fs bGRhdGEtPmFyZy5mbW9kZSk7DQo+IA0KPiBOb3RlIHRoYXQgd2UgcGFzcyB0d28gbmZzNF9zdGF0 ZWlkcyB0byB0aGlzIGNhbGwuIFRoZSBmaXJzdCBpcyB0aGUNCj4gc3RhdGVpZCB0aGF0IGdvdCBz ZW50IGluIHRoZSBDTE9TRSBjYWxsLCBhbmQgdGhlIHNlY29uZCBpcyB0aGUNCj4gc3RhdGVpZA0K PiB0aGF0IGNhbWUgYmFjayBpbiB0aGUgQ0xPU0UgcmVzcG9uc2UuDQo+IA0KPiBSRkM1NjYxIGFu ZCBSRkM3NTMwIGJvdGggaW5kaWNhdGUgdGhhdCB0aGUgc3RhdGVpZCBpbiBhIENMT1NFDQo+IHJl c3BvbnNlDQo+IHNob3VsZCBiZSBpZ25vcmVkLg0KPiANCj4gU28sIEkgdGhpbmsgYSBwYXRjaCBs aWtlIHRoaXMgbWF5IGJlIGluIG9yZGVyLiBBcyB0byB3aGV0aGVyIGl0IHdpbGwNCj4gZml4IHRo aXMgYnVnLCBJIHNvcnQgb2YgZG91YnQgaXQsIGJ1dCBpdCBtaWdodCBub3QgaHVydCB0byB0ZXN0 IGl0DQo+IG91dD8NCj4gDQo+IC0tLS0tLS0tLS0tLS0tLS0tLS0tLS04PC0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tDQo+IA0KPiBbUkZDIFBBVENIXSBuZnM6IHByb3Blcmx5IGlnbm9yZSB0aGUg c3RhdGVpZCBpbiBhIENMT1NFIHJlc3BvbnNlDQo+IA0KPiBTaWduZWQtb2ZmLWJ5OiBKZWZmIExh eXRvbsKgDQo+IC0tLQ0KPiDCoGZzL25mcy9uZnM0cHJvYy5jIHwgMTQgKysrLS0tLS0tLS0tLS0N Cj4gwqAxIGZpbGUgY2hhbmdlZCwgMyBpbnNlcnRpb25zKCspLCAxMSBkZWxldGlvbnMoLSkNCj4g DQo+IGRpZmYgLS1naXQgYS9mcy9uZnMvbmZzNHByb2MuYyBiL2ZzL25mcy9uZnM0cHJvYy5jDQo+ IGluZGV4IDc4OTc4MjZkN2M1MS4uNTg0MTNiZDBhYWUyIDEwMDY0NA0KPiAtLS0gYS9mcy9uZnMv bmZzNHByb2MuYw0KPiArKysgYi9mcy9uZnMvbmZzNHByb2MuYw0KPiBAQCAtMTQ1MSw3ICsxNDUx LDYgQEAgc3RhdGljIHZvaWQNCj4gbmZzX3Jlc3luY19vcGVuX3N0YXRlaWRfbG9ja2VkKHN0cnVj dCBuZnM0X3N0YXRlICpzdGF0ZSkNCj4gwqB9DQo+IMKgDQo+IMKgc3RhdGljIHZvaWQgbmZzX2Ns ZWFyX29wZW5fc3RhdGVpZF9sb2NrZWQoc3RydWN0IG5mczRfc3RhdGUgKnN0YXRlLA0KPiAtCQlu ZnM0X3N0YXRlaWQgKmFyZ19zdGF0ZWlkLA0KPiDCoAkJbmZzNF9zdGF0ZWlkICpzdGF0ZWlkLCBm bW9kZV90IGZtb2RlKQ0KPiDCoHsNCj4gwqAJY2xlYXJfYml0KE5GU19PX1JEV1JfU1RBVEUsICZz dGF0ZS0+ZmxhZ3MpOw0KPiBAQCAtMTQ2NywxMiArMTQ2Niw4IEBAIHN0YXRpYyB2b2lkDQo+IG5m c19jbGVhcl9vcGVuX3N0YXRlaWRfbG9ja2VkKHN0cnVjdCBuZnM0X3N0YXRlICpzdGF0ZSwNCj4g wqAJCWNsZWFyX2JpdChORlNfT19XUk9OTFlfU1RBVEUsICZzdGF0ZS0+ZmxhZ3MpOw0KPiDCoAkJ Y2xlYXJfYml0KE5GU19PUEVOX1NUQVRFLCAmc3RhdGUtPmZsYWdzKTsNCj4gwqAJfQ0KPiAtCWlm IChzdGF0ZWlkID09IE5VTEwpDQo+IC0JCXJldHVybjsNCj4gwqAJLyogSGFuZGxlIHJhY2VzIHdp dGggT1BFTiAqLw0KPiAtCWlmICghbmZzNF9zdGF0ZWlkX21hdGNoX290aGVyKGFyZ19zdGF0ZWlk LCAmc3RhdGUtDQo+ID5vcGVuX3N0YXRlaWQpIHx8DQo+IC0JwqDCoMKgwqAobmZzNF9zdGF0ZWlk X21hdGNoX290aGVyKHN0YXRlaWQsICZzdGF0ZS0+b3Blbl9zdGF0ZWlkKSANCj4gJiYNCj4gLQnC oMKgwqDCoCFuZnM0X3N0YXRlaWRfaXNfbmV3ZXIoc3RhdGVpZCwgJnN0YXRlLT5vcGVuX3N0YXRl aWQpKSkNCj4gew0KPiArCWlmICghbmZzNF9zdGF0ZWlkX21hdGNoX290aGVyKHN0YXRlaWQsICZz dGF0ZS0NCj4gPm9wZW5fc3RhdGVpZCkpIHsNCg0KTm8uIEkgdGhpbmsgd2hhdCB3ZSBzaG91bGQg YmUgZG9pbmcgaGVyZSBpcw0KDQoxKSBpZiAobmZzNF9zdGF0ZWlkX21hdGNoX290aGVyKGFyZ19z dGF0ZWlkLCAmc3RhdGUtPm9wZW5fc3RhdGVpZCkgdGhlbg0KanVzdCBpZ25vcmUgdGhlIHJlc3Vs dCBhbmQgcmV0dXJuIGltbWVkaWF0ZWx5LCBiZWNhdXNlIGl0IGFwcGxpZXMgdG8gYQ0KY29tcGxl dGVseSBkaWZmZXJlbnQgc3RhdGVpZC4NCg0KMikgaWYgKG5mczRfc3RhdGVpZF9tYXRjaF9vdGhl cihzdGF0ZWlkLCAmc3RhdGUtPm9wZW5fc3RhdGVpZCkNCiYmwqAhbmZzNF9zdGF0ZWlkX2lzX25l d2VyKHN0YXRlaWQsICZzdGF0ZS0+b3Blbl9zdGF0ZWlkKSkpIHRoZW4gcmVzeW5jLA0KYmVjYXVz ZSB0aGlzIHdhcyBsaWtlbHkgYW4gT1BFTl9ET1dOR1JBREUgdGhhdCBoYXMgcmFjZWQgd2l0aCBv bmUgb3INCm1vcmUgT1BFTiBjYWxscy4NCg0KTm90ZSB0aGF0IHRoZSByZWFzb24gd2h5IHdlJ3Zl IGJlZW4gY2FyZWZ1bCBhYm91dCB0aGlzIHByZXZpb3VzbHkgaXMNCmJlY2F1c2UgUkZDMzUzMCBk aWQgbm90IGVuZm9yY2UgdGhlICJzZXFpZDpvdGhlciIgZGVmaW5pdGlvbiBvZg0Kc3RhdGVpZHMu IE5vdyB0aGF0IFJGQzc1MzAgc2VjdGlvbiA5LjEuNC4yIGhhcyBzdHJlbmd0aGVuZWQgdGhlDQpk ZWZpbml0aW9uIG9mIHN0YXRlaWRzIGZvciBORlN2NC4wLCB3ZSBjYW4gYXNzdW1lIHRoZSBhYm92 ZSBob2xkcyBmb3INCmFsbCBleGlzdGluZyBtaW5vciB2ZXJzaW9ucyBvZiBORlN2NC4= ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: CLOSE/OPEN race 2016-11-13 14:47 ` Trond Myklebust @ 2016-11-14 14:53 ` Benjamin Coddington 2016-11-14 16:29 ` Trond Myklebust 0 siblings, 1 reply; 17+ messages in thread From: Benjamin Coddington @ 2016-11-14 14:53 UTC (permalink / raw) To: Trond Myklebust; +Cc: jlayton, linux-nfs On 13 Nov 2016, at 9:47, Trond Myklebust wrote: > On Sat, 2016-11-12 at 21:56 -0500, Jeff Layton wrote: >> On Sat, 2016-11-12 at 16:16 -0500, Jeff Layton wrote: >>> >>> On Sat, 2016-11-12 at 13:03 -0500, Benjamin Coddington wrote: >>>> >>>> >>>> On 12 Nov 2016, at 11:52, Jeff Layton wrote: >>>> >>>>> >>>>> >>>>> On Sat, 2016-11-12 at 10:31 -0500, Benjamin Coddington wrote: >>>>>> >>>>>> >>>>>> On 12 Nov 2016, at 7:54, Jeff Layton wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, 2016-11-12 at 06:08 -0500, Benjamin Coddington >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I've been seeing the following on a modified version of >>>>>>>> generic/089 >>>>>>>> that gets the client stuck sending LOCK with >>>>>>>> NFS4ERR_OLD_STATEID. >>>>>>>> >>>>>>>> 1. Client has open stateid A, sends a CLOSE >>>>>>>> 2. Client sends OPEN with same owner >>>>>>>> 3. Client sends another OPEN with same owner >>>>>>>> 4. Client gets a reply to OPEN in 3, stateid is B.2 >>>>>>>> (stateid B >>>>>>>> sequence 2) >>>>>>>> 5. Client does LOCK,LOCKU,FREE_STATEID from B.2 >>>>>>>> 6. Client gets a reply to CLOSE in 1 >>>>>>>> 7. Client gets reply to OPEN in 2, stateid is B.1 >>>>>>>> 8. Client sends LOCK with B.1 - OLD_STATEID, now stuck in >>>>>>>> a loop >>>>>>>> >>>>>>>> The CLOSE response in 6 causes us to clear >>>>>>>> NFS_OPEN_STATE, so that >>>>>>>> the OPEN >>>>>>>> response in 7 is able to update the open_stateid even >>>>>>>> though it has a >>>>>>>> lower >>>>>>>> sequence number. >>>>>>>> >>>>>>>> I think this case could be handled by never updating the >>>>>>>> open_stateid >>>>>>>> if the >>>>>>>> stateids match but the sequence number of the new state >>>>>>>> is less than >>>>>>>> the >>>>>>>> current open_state. >>>>>>>> >>>>>>> >>>>>>> What kernel is this on? >>>>>> >>>>>> On v4.9-rc2 with a couple fixups. Without them, I can't test >>>>>> long >>>>>> enough to >>>>>> reproduce this race. I don't think any of those are involved >>>>>> in this >>>>>> problem, though. >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Yes, that seems wrong. The client should be picking B.2 for >>>>>>> the open >>>>>>> stateid to use. I think that decision of whether to take a >>>>>>> seqid is >>>>>>> made >>>>>>> in nfs_need_update_open_stateid. The logic in there looks >>>>>>> correct to >>>>>>> me >>>>>>> at first glance though. >>>>>> >>>>>> nfs_need_update_open_stateid() will return true if >>>>>> NFS_OPEN_STATE is >>>>>> unset. >>>>>> That's the precondition set up by steps 1-6. Perhaps it >>>>>> should not >>>>>> update >>>>>> the stateid if they match but the sequence number is less, >>>>>> and still set >>>>>> NFS_OPEN_STATE once more. That will fix _this_ case. Are >>>>>> there other >>>>>> cases >>>>>> where that would be a problem? >>>>>> >>>>>> Ben >>>>> >>>>> That seems wrong. >>>> >>>> I'm not sure what you mean: what seems wrong? >>>> >>> >>> Sorry, it seems wrong that the client would issue the LOCK with B.1 >>> there. >>> >>>> >>>>> >>>>> >>>>> The only close was sent in step 1, and that was for a >>>>> completely different stateid (A rather than B). It seems likely >>>>> that >>>>> that is where the bug is. >>>> >>>> I'm still not sure what point you're trying to make.. >>>> >>>> Even though the close was sent in step 1, the response wasn't >>>> processed >>>> until step 6.. >>> >>> Not really a point per-se, I was just saying where I think the bug >>> might >>> be... >>> >>> When you issue a CLOSE, you issue it vs. a particular stateid >>> (stateid >>> "A" in this case). Once the open stateid has been superseded by >>> "B", the >>> closing of "A" should have no effect. >>> >>> Perhaps nfs_clear_open_stateid needs to check and see whether the >>> open >>> stateid has been superseded before doing its thing? >>> >> >> Ok, I see something that might be a problem in this call in >> nfs4_close_done: >> >> nfs_clear_open_stateid(state, &calldata->arg.stateid, >> res_stateid, >> calldata->arg.fmode); >> >> Note that we pass two nfs4_stateids to this call. The first is the >> stateid that got sent in the CLOSE call, and the second is the >> stateid >> that came back in the CLOSE response. >> >> RFC5661 and RFC7530 both indicate that the stateid in a CLOSE >> response >> should be ignored. >> >> So, I think a patch like this may be in order. As to whether it will >> fix this bug, I sort of doubt it, but it might not hurt to test it >> out? >> >> ----------------------8<-------------------------- >> >> [RFC PATCH] nfs: properly ignore the stateid in a CLOSE response >> >> Signed-off-by: Jeff Layton >> --- >> fs/nfs/nfs4proc.c | 14 +++----------- >> 1 file changed, 3 insertions(+), 11 deletions(-) >> >> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >> index 7897826d7c51..58413bd0aae2 100644 >> --- a/fs/nfs/nfs4proc.c >> +++ b/fs/nfs/nfs4proc.c >> @@ -1451,7 +1451,6 @@ static void >> nfs_resync_open_stateid_locked(struct nfs4_state *state) >> } >> >> static void nfs_clear_open_stateid_locked(struct nfs4_state *state, >> - nfs4_stateid *arg_stateid, >> nfs4_stateid *stateid, fmode_t fmode) >> { >> clear_bit(NFS_O_RDWR_STATE, &state->flags); >> @@ -1467,12 +1466,8 @@ static void >> nfs_clear_open_stateid_locked(struct nfs4_state *state, >> clear_bit(NFS_O_WRONLY_STATE, &state->flags); >> clear_bit(NFS_OPEN_STATE, &state->flags); >> } >> - if (stateid == NULL) >> - return; >> /* Handle races with OPEN */ >> - if (!nfs4_stateid_match_other(arg_stateid, &state- >>> open_stateid) || >> - (nfs4_stateid_match_other(stateid, &state->open_stateid) >> && >> - !nfs4_stateid_is_newer(stateid, &state->open_stateid))) >> { >> + if (!nfs4_stateid_match_other(stateid, &state- >>> open_stateid)) { > > No. I think what we should be doing here is > > 1) if (nfs4_stateid_match_other(arg_stateid, &state->open_stateid) > then You must mean (!nfs4_stateid_match_other(arg_stateid, &state->open_stateid) > just ignore the result and return immediately, because it applies to a > completely different stateid. > > 2) if (nfs4_stateid_match_other(stateid, &state->open_stateid) > && !nfs4_stateid_is_newer(stateid, &state->open_stateid))) then > resync, > because this was likely an OPEN_DOWNGRADE that has raced with one or > more OPEN calls. OK, but these do not help the originally reported race because at the time that the CLOSE response handling does a resync - I presume nfs_resync_open_stateid_locked() - all the state counters are zero, which bypasses resetting NFS_OPEN_STATE, which, if unset, allows any stateid to update the open_stateid. I need something like: diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index a230764b7d07..c9aa166c45aa 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -1425,8 +1425,13 @@ static void nfs_test_and_clear_all_open_stateid(struct nfs4_state *state) static bool nfs_need_update_open_stateid(struct nfs4_state *state, const nfs4_stateid *stateid, nfs4_stateid *freeme) { - if (test_and_set_bit(NFS_OPEN_STATE, &state->flags) == 0) + if (test_and_set_bit(NFS_OPEN_STATE, &state->flags) == 0) { + if (nfs4_stateid_match_other(stateid, &state->open_stateid) && + !nfs4_stateid_is_newer(stateid, &state->open_stateid)) + return false; return true; + } + if (!nfs4_stateid_match_other(stateid, &state->open_stateid)) { nfs4_stateid_copy(freeme, &state->open_stateid); nfs_test_and_clear_all_open_stateid(state); I've got that in testing, and I'll let it run for a day or so. Another way to simplify CLOSE/OPEN races might be to never reuse an openowner if we send a close. Any opens after a close would have to use a new mutated openowner, so the state always proceeds from open to close and never back again. Any reason not to do that? Ben ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: CLOSE/OPEN race 2016-11-14 14:53 ` Benjamin Coddington @ 2016-11-14 16:29 ` Trond Myklebust 2016-11-14 18:40 ` Benjamin Coddington 0 siblings, 1 reply; 17+ messages in thread From: Trond Myklebust @ 2016-11-14 16:29 UTC (permalink / raw) To: bcodding; +Cc: jlayton, linux-nfs T24gTW9uLCAyMDE2LTExLTE0IGF0IDA5OjUzIC0wNTAwLCBCZW5qYW1pbiBDb2RkaW5ndG9uIHdy b3RlOg0KPiBPbiAxMyBOb3YgMjAxNiwgYXQgOTo0NywgVHJvbmQgTXlrbGVidXN0IHdyb3RlOg0K PiANCj4gPiANCj4gPiBPbiBTYXQsIDIwMTYtMTEtMTIgYXQgMjE6NTYgLTA1MDAsIEplZmYgTGF5 dG9uIHdyb3RlOg0KPiA+ID4gDQo+ID4gPiBPbiBTYXQsIDIwMTYtMTEtMTIgYXQgMTY6MTYgLTA1 MDAsIEplZmYgTGF5dG9uIHdyb3RlOg0KPiA+ID4gPiANCj4gPiA+ID4gDQo+ID4gPiA+IE9uIFNh dCwgMjAxNi0xMS0xMiBhdCAxMzowMyAtMDUwMCwgQmVuamFtaW4gQ29kZGluZ3RvbiB3cm90ZToN Cj4gPiA+ID4gPiANCj4gPiA+ID4gPiANCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBPbiAxMiBOb3Yg MjAxNiwgYXQgMTE6NTIsIEplZmYgTGF5dG9uIHdyb3RlOg0KPiA+ID4gPiA+IA0KPiA+ID4gPiA+ ID4gDQo+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gT24gU2F0LCAyMDE2 LTExLTEyIGF0IDEwOjMxIC0wNTAwLCBCZW5qYW1pbiBDb2RkaW5ndG9uDQo+ID4gPiA+ID4gPiB3 cm90ZToNCj4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiANCj4g PiA+ID4gPiA+ID4gT24gMTIgTm92IDIwMTYsIGF0IDc6NTQsIEplZmYgTGF5dG9uIHdyb3RlOg0K PiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiBPbiBTYXQsIDIw MTYtMTEtMTIgYXQgMDY6MDggLTA1MDAsIEJlbmphbWluIENvZGRpbmd0b24NCj4gPiA+ID4gPiA+ ID4gPiB3cm90ZToNCj4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gDQo+ID4g PiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+ID4gSSd2 ZSBiZWVuIHNlZWluZyB0aGUgZm9sbG93aW5nIG9uIGEgbW9kaWZpZWQgdmVyc2lvbg0KPiA+ID4g PiA+ID4gPiA+ID4gb2YNCj4gPiA+ID4gPiA+ID4gPiA+IGdlbmVyaWMvMDg5DQo+ID4gPiA+ID4g PiA+ID4gPiB0aGF0IGdldHMgdGhlIGNsaWVudCBzdHVjayBzZW5kaW5nIExPQ0sgd2l0aA0KPiA+ ID4gPiA+ID4gPiA+ID4gTkZTNEVSUl9PTERfU1RBVEVJRC4NCj4gPiA+ID4gPiA+ID4gPiA+IA0K PiA+ID4gPiA+ID4gPiA+ID4gMS4gQ2xpZW50IGhhcyBvcGVuIHN0YXRlaWQgQSwgc2VuZHMgYSBD TE9TRQ0KPiA+ID4gPiA+ID4gPiA+ID4gMi4gQ2xpZW50IHNlbmRzIE9QRU4gd2l0aCBzYW1lIG93 bmVyDQo+ID4gPiA+ID4gPiA+ID4gPiAzLiBDbGllbnQgc2VuZHMgYW5vdGhlciBPUEVOIHdpdGgg c2FtZSBvd25lcg0KPiA+ID4gPiA+ID4gPiA+ID4gNC4gQ2xpZW50IGdldHMgYSByZXBseSB0byBP UEVOIGluIDMsIHN0YXRlaWQgaXMgQi4yDQo+ID4gPiA+ID4gPiA+ID4gPiAoc3RhdGVpZCBCDQo+ ID4gPiA+ID4gPiA+ID4gPiBzZXF1ZW5jZSAyKQ0KPiA+ID4gPiA+ID4gPiA+ID4gNS4gQ2xpZW50 IGRvZXMgTE9DSyxMT0NLVSxGUkVFX1NUQVRFSUQgZnJvbSBCLjINCj4gPiA+ID4gPiA+ID4gPiA+ IDYuIENsaWVudCBnZXRzIGEgcmVwbHkgdG8gQ0xPU0UgaW4gMQ0KPiA+ID4gPiA+ID4gPiA+ID4g Ny4gQ2xpZW50IGdldHMgcmVwbHkgdG8gT1BFTiBpbiAyLCBzdGF0ZWlkIGlzIEIuMQ0KPiA+ID4g PiA+ID4gPiA+ID4gOC4gQ2xpZW50IHNlbmRzIExPQ0sgd2l0aCBCLjEgLSBPTERfU1RBVEVJRCwg bm93DQo+ID4gPiA+ID4gPiA+ID4gPiBzdHVjayBpbg0KPiA+ID4gPiA+ID4gPiA+ID4gYSBsb29w DQo+ID4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiA+IFRoZSBDTE9TRSByZXNwb25z ZSBpbiA2IGNhdXNlcyB1cyB0byBjbGVhcg0KPiA+ID4gPiA+ID4gPiA+ID4gTkZTX09QRU5fU1RB VEUsIHNvIHRoYXQNCj4gPiA+ID4gPiA+ID4gPiA+IHRoZSBPUEVODQo+ID4gPiA+ID4gPiA+ID4g PiByZXNwb25zZSBpbiA3IGlzIGFibGUgdG8gdXBkYXRlIHRoZSBvcGVuX3N0YXRlaWQgZXZlbg0K PiA+ID4gPiA+ID4gPiA+ID4gdGhvdWdoIGl0IGhhcyBhDQo+ID4gPiA+ID4gPiA+ID4gPiBsb3dl cg0KPiA+ID4gPiA+ID4gPiA+ID4gc2VxdWVuY2UgbnVtYmVyLg0KPiA+ID4gPiA+ID4gPiA+ID4g DQo+ID4gPiA+ID4gPiA+ID4gPiBJIHRoaW5rIHRoaXMgY2FzZSBjb3VsZCBiZSBoYW5kbGVkIGJ5 IG5ldmVyIHVwZGF0aW5nDQo+ID4gPiA+ID4gPiA+ID4gPiB0aGUNCj4gPiA+ID4gPiA+ID4gPiA+ IG9wZW5fc3RhdGVpZA0KPiA+ID4gPiA+ID4gPiA+ID4gaWYgdGhlDQo+ID4gPiA+ID4gPiA+ID4g PiBzdGF0ZWlkcyBtYXRjaCBidXQgdGhlIHNlcXVlbmNlIG51bWJlciBvZiB0aGUgbmV3DQo+ID4g PiA+ID4gPiA+ID4gPiBzdGF0ZQ0KPiA+ID4gPiA+ID4gPiA+ID4gaXMgbGVzcyB0aGFuDQo+ID4g PiA+ID4gPiA+ID4gPiB0aGUNCj4gPiA+ID4gPiA+ID4gPiA+IGN1cnJlbnQgb3Blbl9zdGF0ZS4N Cj4gPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiA+IFdo YXQga2VybmVsIGlzIHRoaXMgb24/DQo+ID4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiBPbiB2 NC45LXJjMiB3aXRoIGEgY291cGxlIGZpeHVwcy7CoMKgV2l0aG91dCB0aGVtLCBJIGNhbid0DQo+ ID4gPiA+ID4gPiA+IHRlc3QNCj4gPiA+ID4gPiA+ID4gbG9uZw0KPiA+ID4gPiA+ID4gPiBlbm91 Z2ggdG8NCj4gPiA+ID4gPiA+ID4gcmVwcm9kdWNlIHRoaXMgcmFjZS7CoMKgSSBkb24ndCB0aGlu ayBhbnkgb2YgdGhvc2UgYXJlDQo+ID4gPiA+ID4gPiA+IGludm9sdmVkDQo+ID4gPiA+ID4gPiA+ IGluIHRoaXMNCj4gPiA+ID4gPiA+ID4gcHJvYmxlbSwgdGhvdWdoLg0KPiA+ID4gPiA+ID4gPiAN Cj4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiANCj4g PiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gPiBZZXMsIHRoYXQgc2VlbXMgd3JvbmcuIFRo ZSBjbGllbnQgc2hvdWxkIGJlIHBpY2tpbmcgQi4yDQo+ID4gPiA+ID4gPiA+ID4gZm9yDQo+ID4g PiA+ID4gPiA+ID4gdGhlIG9wZW4NCj4gPiA+ID4gPiA+ID4gPiBzdGF0ZWlkIHRvIHVzZS4gSSB0 aGluayB0aGF0IGRlY2lzaW9uIG9mIHdoZXRoZXIgdG8NCj4gPiA+ID4gPiA+ID4gPiB0YWtlIGEN Cj4gPiA+ID4gPiA+ID4gPiBzZXFpZCBpcw0KPiA+ID4gPiA+ID4gPiA+IG1hZGUNCj4gPiA+ID4g PiA+ID4gPiBpbsKgbmZzX25lZWRfdXBkYXRlX29wZW5fc3RhdGVpZC4gVGhlIGxvZ2ljIGluIHRo ZXJlDQo+ID4gPiA+ID4gPiA+ID4gbG9va3MNCj4gPiA+ID4gPiA+ID4gPiBjb3JyZWN0IHRvDQo+ ID4gPiA+ID4gPiA+ID4gbWUNCj4gPiA+ID4gPiA+ID4gPiBhdCBmaXJzdCBnbGFuY2UgdGhvdWdo Lg0KPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gbmZzX25lZWRfdXBkYXRlX29wZW5fc3Rh dGVpZCgpIHdpbGwgcmV0dXJuIHRydWUgaWYNCj4gPiA+ID4gPiA+ID4gTkZTX09QRU5fU1RBVEUg aXMNCj4gPiA+ID4gPiA+ID4gdW5zZXQuDQo+ID4gPiA+ID4gPiA+IFRoYXQncyB0aGUgcHJlY29u ZGl0aW9uIHNldCB1cCBieSBzdGVwcyAxLTYuwqDCoFBlcmhhcHMgaXQNCj4gPiA+ID4gPiA+ID4g c2hvdWxkIG5vdA0KPiA+ID4gPiA+ID4gPiB1cGRhdGUNCj4gPiA+ID4gPiA+ID4gdGhlIHN0YXRl aWQgaWYgdGhleSBtYXRjaCBidXQgdGhlIHNlcXVlbmNlIG51bWJlciBpcw0KPiA+ID4gPiA+ID4g PiBsZXNzLA0KPiA+ID4gPiA+ID4gPiBhbmQgc3RpbGwgc2V0DQo+ID4gPiA+ID4gPiA+IE5GU19P UEVOX1NUQVRFIG9uY2UgbW9yZS7CoMKgVGhhdCB3aWxsIGZpeCBfdGhpc18NCj4gPiA+ID4gPiA+ ID4gY2FzZS7CoMKgQXJlDQo+ID4gPiA+ID4gPiA+IHRoZXJlIG90aGVyDQo+ID4gPiA+ID4gPiA+ IGNhc2VzDQo+ID4gPiA+ID4gPiA+IHdoZXJlIHRoYXQgd291bGQgYmUgYSBwcm9ibGVtPw0KPiA+ ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ID4gQmVuDQo+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+ IFRoYXQgc2VlbXMgd3JvbmcuDQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4gSSdtIG5vdCBzdXJlIHdo YXQgeW91IG1lYW46IHdoYXQgc2VlbXMgd3Jvbmc/DQo+ID4gPiA+ID4gDQo+ID4gPiA+IA0KPiA+ ID4gPiBTb3JyeSwgaXQgc2VlbXMgd3JvbmcgdGhhdCB0aGUgY2xpZW50IHdvdWxkIGlzc3VlIHRo ZSBMT0NLIHdpdGgNCj4gPiA+ID4gQi4xDQo+ID4gPiA+IHRoZXJlLg0KPiA+ID4gPiANCj4gPiA+ ID4gPiANCj4gPiA+ID4gPiANCj4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4g PiANCj4gPiA+ID4gPiA+IFRoZSBvbmx5IGNsb3NlIHdhcyBzZW50IGluIHN0ZXAgMSwgYW5kIHRo YXQgd2FzIGZvciBhDQo+ID4gPiA+ID4gPiBjb21wbGV0ZWx5IGRpZmZlcmVudCBzdGF0ZWlkIChB IHJhdGhlciB0aGFuIEIpLiBJdCBzZWVtcw0KPiA+ID4gPiA+ID4gbGlrZWx5DQo+ID4gPiA+ID4g PiB0aGF0DQo+ID4gPiA+ID4gPiB0aGF0IGlzIHdoZXJlIHRoZSBidWcgaXMuDQo+ID4gPiA+ID4g DQo+ID4gPiA+ID4gSSdtIHN0aWxsIG5vdCBzdXJlIHdoYXQgcG9pbnQgeW91J3JlIHRyeWluZyB0 byBtYWtlLi4NCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBFdmVuIHRob3VnaCB0aGUgY2xvc2Ugd2Fz IHNlbnQgaW4gc3RlcCAxLCB0aGUgcmVzcG9uc2Ugd2Fzbid0DQo+ID4gPiA+ID4gcHJvY2Vzc2Vk DQo+ID4gPiA+ID4gdW50aWwgc3RlcCA2Li4NCj4gPiA+ID4gDQo+ID4gPiA+IE5vdCByZWFsbHkg YSBwb2ludCBwZXItc2UsIEkgd2FzIGp1c3Qgc2F5aW5nIHdoZXJlIEkgdGhpbmsgdGhlDQo+ID4g PiA+IGJ1Zw0KPiA+ID4gPiBtaWdodA0KPiA+ID4gPiBiZS4uLg0KPiA+ID4gPiANCj4gPiA+ID4g V2hlbiB5b3UgaXNzdWUgYSBDTE9TRSwgeW91IGlzc3VlIGl0IHZzLiBhIHBhcnRpY3VsYXIgc3Rh dGVpZA0KPiA+ID4gPiAoc3RhdGVpZA0KPiA+ID4gPiAiQSIgaW4gdGhpcyBjYXNlKS4gT25jZSB0 aGUgb3BlbiBzdGF0ZWlkIGhhcyBiZWVuIHN1cGVyc2VkZWQgYnkNCj4gPiA+ID4gIkIiLCB0aGUN Cj4gPiA+ID4gY2xvc2luZyBvZiAiQSIgc2hvdWxkIGhhdmUgbm8gZWZmZWN0Lg0KPiA+ID4gPiAN Cj4gPiA+ID4gUGVyaGFwc8KgbmZzX2NsZWFyX29wZW5fc3RhdGVpZCBuZWVkcyB0byBjaGVjayBh bmQgc2VlIHdoZXRoZXINCj4gPiA+ID4gdGhlDQo+ID4gPiA+IG9wZW4NCj4gPiA+ID4gc3RhdGVp ZCBoYXMgYmVlbiBzdXBlcnNlZGVkIGJlZm9yZSBkb2luZyBpdHMgdGhpbmc/DQo+ID4gPiA+IA0K PiA+ID4gDQo+ID4gPiBPaywgSSBzZWUgc29tZXRoaW5nIHRoYXQgbWlnaHQgYmUgYSBwcm9ibGVt IGluIHRoaXMgY2FsbCBpbg0KPiA+ID4gbmZzNF9jbG9zZV9kb25lOg0KPiA+ID4gDQo+ID4gPiDC oMKgwqDCoMKgwqDCoG5mc19jbGVhcl9vcGVuX3N0YXRlaWQoc3RhdGUsICZjYWxsZGF0YS0+YXJn LnN0YXRlaWQsDQo+ID4gPiDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqByZXNfc3RhdGVpZCzCoA0KPiA+ID4gY2FsbGRhdGEtPmFyZy5mbW9kZSk7DQo+ID4g PiANCj4gPiA+IE5vdGUgdGhhdCB3ZSBwYXNzIHR3byBuZnM0X3N0YXRlaWRzIHRvIHRoaXMgY2Fs bC4gVGhlIGZpcnN0IGlzDQo+ID4gPiB0aGUNCj4gPiA+IHN0YXRlaWQgdGhhdCBnb3Qgc2VudCBp biB0aGUgQ0xPU0UgY2FsbCwgYW5kIHRoZSBzZWNvbmQgaXMgdGhlDQo+ID4gPiBzdGF0ZWlkDQo+ ID4gPiB0aGF0IGNhbWUgYmFjayBpbiB0aGUgQ0xPU0UgcmVzcG9uc2UuDQo+ID4gPiANCj4gPiA+ IFJGQzU2NjEgYW5kIFJGQzc1MzAgYm90aCBpbmRpY2F0ZSB0aGF0IHRoZSBzdGF0ZWlkIGluIGEg Q0xPU0UNCj4gPiA+IHJlc3BvbnNlDQo+ID4gPiBzaG91bGQgYmUgaWdub3JlZC4NCj4gPiA+IA0K PiA+ID4gU28sIEkgdGhpbmsgYSBwYXRjaCBsaWtlIHRoaXMgbWF5IGJlIGluIG9yZGVyLiBBcyB0 byB3aGV0aGVyIGl0DQo+ID4gPiB3aWxsDQo+ID4gPiBmaXggdGhpcyBidWcsIEkgc29ydCBvZiBk b3VidCBpdCwgYnV0IGl0IG1pZ2h0IG5vdCBodXJ0IHRvIHRlc3QNCj4gPiA+IGl0DQo+ID4gPiBv dXQ/DQo+ID4gPiANCj4gPiA+IC0tLS0tLS0tLS0tLS0tLS0tLS0tLS04PC0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tDQo+ID4gPiANCj4gPiA+IFtSRkMgUEFUQ0hdIG5mczogcHJvcGVybHkgaWdu b3JlIHRoZSBzdGF0ZWlkIGluIGEgQ0xPU0UgcmVzcG9uc2UNCj4gPiA+IA0KPiA+ID4gU2lnbmVk LW9mZi1ieTogSmVmZiBMYXl0b27CoA0KPiA+ID4gLS0tDQo+ID4gPiDCoGZzL25mcy9uZnM0cHJv Yy5jIHwgMTQgKysrLS0tLS0tLS0tLS0NCj4gPiA+IMKgMSBmaWxlIGNoYW5nZWQsIDMgaW5zZXJ0 aW9ucygrKSwgMTEgZGVsZXRpb25zKC0pDQo+ID4gPiANCj4gPiA+IGRpZmYgLS1naXQgYS9mcy9u ZnMvbmZzNHByb2MuYyBiL2ZzL25mcy9uZnM0cHJvYy5jDQo+ID4gPiBpbmRleCA3ODk3ODI2ZDdj NTEuLjU4NDEzYmQwYWFlMiAxMDA2NDQNCj4gPiA+IC0tLSBhL2ZzL25mcy9uZnM0cHJvYy5jDQo+ ID4gPiArKysgYi9mcy9uZnMvbmZzNHByb2MuYw0KPiA+ID4gQEAgLTE0NTEsNyArMTQ1MSw2IEBA IHN0YXRpYyB2b2lkDQo+ID4gPiBuZnNfcmVzeW5jX29wZW5fc3RhdGVpZF9sb2NrZWQoc3RydWN0 IG5mczRfc3RhdGUgKnN0YXRlKQ0KPiA+ID4gwqB9DQo+ID4gPiDCoA0KPiA+ID4gwqBzdGF0aWMg dm9pZCBuZnNfY2xlYXJfb3Blbl9zdGF0ZWlkX2xvY2tlZChzdHJ1Y3QgbmZzNF9zdGF0ZQ0KPiA+ ID4gKnN0YXRlLA0KPiA+ID4gLQkJbmZzNF9zdGF0ZWlkICphcmdfc3RhdGVpZCwNCj4gPiA+IMKg CQluZnM0X3N0YXRlaWQgKnN0YXRlaWQsIGZtb2RlX3QgZm1vZGUpDQo+ID4gPiDCoHsNCj4gPiA+ IMKgCWNsZWFyX2JpdChORlNfT19SRFdSX1NUQVRFLCAmc3RhdGUtPmZsYWdzKTsNCj4gPiA+IEBA IC0xNDY3LDEyICsxNDY2LDggQEAgc3RhdGljIHZvaWQNCj4gPiA+IG5mc19jbGVhcl9vcGVuX3N0 YXRlaWRfbG9ja2VkKHN0cnVjdCBuZnM0X3N0YXRlICpzdGF0ZSwNCj4gPiA+IMKgCQljbGVhcl9i aXQoTkZTX09fV1JPTkxZX1NUQVRFLCAmc3RhdGUtPmZsYWdzKTsNCj4gPiA+IMKgCQljbGVhcl9i aXQoTkZTX09QRU5fU1RBVEUsICZzdGF0ZS0+ZmxhZ3MpOw0KPiA+ID4gwqAJfQ0KPiA+ID4gLQlp ZiAoc3RhdGVpZCA9PSBOVUxMKQ0KPiA+ID4gLQkJcmV0dXJuOw0KPiA+ID4gwqAJLyogSGFuZGxl IHJhY2VzIHdpdGggT1BFTiAqLw0KPiA+ID4gLQlpZiAoIW5mczRfc3RhdGVpZF9tYXRjaF9vdGhl cihhcmdfc3RhdGVpZCwgJnN0YXRlLQ0KPiA+ID4gPiANCj4gPiA+ID4gb3Blbl9zdGF0ZWlkKSB8 fA0KPiA+ID4gLQnCoMKgwqDCoChuZnM0X3N0YXRlaWRfbWF0Y2hfb3RoZXIoc3RhdGVpZCwgJnN0 YXRlLQ0KPiA+ID4gPm9wZW5fc3RhdGVpZCkNCj4gPiA+ICYmDQo+ID4gPiAtCcKgwqDCoMKgIW5m czRfc3RhdGVpZF9pc19uZXdlcihzdGF0ZWlkLCAmc3RhdGUtDQo+ID4gPiA+b3Blbl9zdGF0ZWlk KSkpDQo+ID4gPiB7DQo+ID4gPiArCWlmICghbmZzNF9zdGF0ZWlkX21hdGNoX290aGVyKHN0YXRl aWQsICZzdGF0ZS0NCj4gPiA+ID4gDQo+ID4gPiA+IG9wZW5fc3RhdGVpZCkpIHsNCj4gPiANCj4g PiBOby4gSSB0aGluayB3aGF0IHdlIHNob3VsZCBiZSBkb2luZyBoZXJlIGlzDQo+ID4gDQo+ID4g MSkgaWYgKG5mczRfc3RhdGVpZF9tYXRjaF9vdGhlcihhcmdfc3RhdGVpZCwgJnN0YXRlLT5vcGVu X3N0YXRlaWQpwqANCj4gPiB0aGVuDQo+IA0KPiBZb3UgbXVzdCBtZWFuICghbmZzNF9zdGF0ZWlk X21hdGNoX290aGVyKGFyZ19zdGF0ZWlkLMKgDQo+ICZzdGF0ZS0+b3Blbl9zdGF0ZWlkKQ0KDQpT b3JyeS4gWWVzLg0KDQo+ID4gDQo+ID4ganVzdCBpZ25vcmUgdGhlIHJlc3VsdCBhbmQgcmV0dXJu IGltbWVkaWF0ZWx5LCBiZWNhdXNlIGl0IGFwcGxpZXMNCj4gPiB0byBhDQo+ID4gY29tcGxldGVs eSBkaWZmZXJlbnQgc3RhdGVpZC4NCj4gPiANCj4gPiAyKSBpZiAobmZzNF9zdGF0ZWlkX21hdGNo X290aGVyKHN0YXRlaWQsICZzdGF0ZS0+b3Blbl9zdGF0ZWlkKQ0KPiA+ICYmwqAhbmZzNF9zdGF0 ZWlkX2lzX25ld2VyKHN0YXRlaWQsICZzdGF0ZS0+b3Blbl9zdGF0ZWlkKSkpIHRoZW7CoA0KPiA+ IHJlc3luYywNCj4gPiBiZWNhdXNlIHRoaXMgd2FzIGxpa2VseSBhbiBPUEVOX0RPV05HUkFERSB0 aGF0IGhhcyByYWNlZCB3aXRoIG9uZQ0KPiA+IG9yDQo+ID4gbW9yZSBPUEVOIGNhbGxzLg0KPiAN Cj4gT0ssIGJ1dCB0aGVzZSBkbyBub3QgaGVscCB0aGUgb3JpZ2luYWxseSByZXBvcnRlZCByYWNl IGJlY2F1c2UgYXQNCj4gdGhlwqANCj4gdGltZQ0KPiB0aGF0IHRoZSBDTE9TRSByZXNwb25zZSBo YW5kbGluZyBkb2VzIGEgcmVzeW5jIC0gSSBwcmVzdW1lDQo+IG5mc19yZXN5bmNfb3Blbl9zdGF0 ZWlkX2xvY2tlZCgpIC0gYWxsIHRoZSBzdGF0ZSBjb3VudGVycyBhcmUgemVybyzCoA0KPiB3aGlj aA0KPiBieXBhc3NlcyByZXNldHRpbmcgTkZTX09QRU5fU1RBVEUsIHdoaWNoLCBpZiB1bnNldCwg YWxsb3dzIGFueQ0KPiBzdGF0ZWlkwqANCj4gdG8NCj4gdXBkYXRlIHRoZSBvcGVuX3N0YXRlaWQu DQo+wqANCg0KUGxlYXNlIHNlZSB0aGUgMiBwYXRjaGVzIEkganVzdCBwb3N0ZWQuIFRoZXkgc2hv dWxkIGZpeCB0aGUgcHJvYmxlbS4= ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: CLOSE/OPEN race 2016-11-14 16:29 ` Trond Myklebust @ 2016-11-14 18:40 ` Benjamin Coddington 0 siblings, 0 replies; 17+ messages in thread From: Benjamin Coddington @ 2016-11-14 18:40 UTC (permalink / raw) To: Trond Myklebust; +Cc: jlayton, linux-nfs T24gMTQgTm92IDIwMTYsIGF0IDExOjI5LCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6Cj4KPiBQbGVh c2Ugc2VlIHRoZSAyIHBhdGNoZXMgSSBqdXN0IHBvc3RlZC4gVGhleSBzaG91bGQgZml4IHRoZSAK PiBwcm9ibGVtLk7Ci8KnwrLDpsOscsK4wpt5w7rDqMKaw5hiwrJYwqzCtsOHwqd2w5hewpYpw57C unsubsOHK8KJwrfCpcKKe8Kxwp3DuyLCnsOYXm7Ch3LCocO2wqZ6w4sawoHDq2jCmcKow6jCrcOa JsKiw7gewq5HwqvCncOpaMKuAyjCrcOpwprCjsKKw53Comoiwp3DuhrCthttwqfDv8OvwoHDqsOk esK5w57ClsKKw6DDvmbCo8KiwrdowprCiMKnfsKIbcKaCgpTZWVtcyBsaWtlIHRoZXkgZG8uICBU aGFua3MhICBJJ2xsIGxlYXZlIHRoZW0gc3Bpbm5pbmcgaW4gdGhhdCB0ZXN0Cm92ZXJuaWdodCB0 byByZWFsbHkgZ2l2ZSB0aGVtIGEgd29ya291dC4KCkJlbgo= ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: CLOSE/OPEN race 2016-11-12 11:08 CLOSE/OPEN race Benjamin Coddington 2016-11-12 12:54 ` Jeff Layton @ 2016-11-12 18:16 ` Trond Myklebust 2016-11-12 18:46 ` Benjamin Coddington 2016-11-13 3:09 ` Jeff Layton 1 sibling, 2 replies; 17+ messages in thread From: Trond Myklebust @ 2016-11-12 18:16 UTC (permalink / raw) To: Benjamin Coddington; +Cc: Linux List DQo+IE9uIE5vdiAxMiwgMjAxNiwgYXQgMDY6MDgsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gSSd2ZSBiZWVuIHNlZWluZyB0aGUgZm9sbG93 aW5nIG9uIGEgbW9kaWZpZWQgdmVyc2lvbiBvZiBnZW5lcmljLzA4OQ0KPiB0aGF0IGdldHMgdGhl IGNsaWVudCBzdHVjayBzZW5kaW5nIExPQ0sgd2l0aCBORlM0RVJSX09MRF9TVEFURUlELg0KPiAN Cj4gMS4gQ2xpZW50IGhhcyBvcGVuIHN0YXRlaWQgQSwgc2VuZHMgYSBDTE9TRQ0KPiAyLiBDbGll bnQgc2VuZHMgT1BFTiB3aXRoIHNhbWUgb3duZXINCj4gMy4gQ2xpZW50IHNlbmRzIGFub3RoZXIg T1BFTiB3aXRoIHNhbWUgb3duZXINCj4gNC4gQ2xpZW50IGdldHMgYSByZXBseSB0byBPUEVOIGlu IDMsIHN0YXRlaWQgaXMgQi4yIChzdGF0ZWlkIEIgc2VxdWVuY2UgMikNCj4gNS4gQ2xpZW50IGRv ZXMgTE9DSyxMT0NLVSxGUkVFX1NUQVRFSUQgZnJvbSBCLjINCj4gNi4gQ2xpZW50IGdldHMgYSBy ZXBseSB0byBDTE9TRSBpbiAxDQo+IDcuIENsaWVudCBnZXRzIHJlcGx5IHRvIE9QRU4gaW4gMiwg c3RhdGVpZCBpcyBCLjENCj4gOC4gQ2xpZW50IHNlbmRzIExPQ0sgd2l0aCBCLjEgLSBPTERfU1RB VEVJRCwgbm93IHN0dWNrIGluIGEgbG9vcA0KPiANCj4gVGhlIENMT1NFIHJlc3BvbnNlIGluIDYg Y2F1c2VzIHVzIHRvIGNsZWFyIE5GU19PUEVOX1NUQVRFLCBzbyB0aGF0IHRoZSBPUEVODQo+IHJl c3BvbnNlIGluIDcgaXMgYWJsZSB0byB1cGRhdGUgdGhlIG9wZW5fc3RhdGVpZCBldmVuIHRob3Vn aCBpdCBoYXMgYSBsb3dlcg0KPiBzZXF1ZW5jZSBudW1iZXIuDQoNCkhtbeKApiBXZSBwcm9iYWJs eSBzaG91bGQgbm90IGRvIHRoYXQgaWYgdGhlIHN0YXRlaWQub3RoZXIgZmllbGQgb2YgQSAoaS5l LiB0aGUgb25lIHN1cHBsaWVkIGFzIHRoZSBhcmd1bWVudCB0byBDTE9TRSkgZG9lcyBub3QgbWF0 Y2ggdGhlIHN0YXRlaWQub3RoZXIgb2YgQi4NCkluIGZhY3QsIHRoZSByZXBseSBpbiAoNCksIHdo ZXJlIHRoZSBzdGF0ZWlkIGNoYW5nZXMgdG8gQiwgc2hvdWxkIGJlIHRoZSB0aGluZyB0aGF0IHJl c2V0cyB0aGUgT1BFTiBzdGF0ZS4= ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: CLOSE/OPEN race 2016-11-12 18:16 ` Trond Myklebust @ 2016-11-12 18:46 ` Benjamin Coddington 2016-11-13 3:09 ` Jeff Layton 1 sibling, 0 replies; 17+ messages in thread From: Benjamin Coddington @ 2016-11-12 18:46 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux List On 12 Nov 2016, at 13:16, Trond Myklebust wrote: > >> On Nov 12, 2016, at 06:08, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> I've been seeing the following on a modified version of generic/089 >> that gets the client stuck sending LOCK with NFS4ERR_OLD_STATEID. >> >> 1. Client has open stateid A, sends a CLOSE >> 2. Client sends OPEN with same owner >> 3. Client sends another OPEN with same owner >> 4. Client gets a reply to OPEN in 3, stateid is B.2 (stateid B >> sequence 2) >> 5. Client does LOCK,LOCKU,FREE_STATEID from B.2 >> 6. Client gets a reply to CLOSE in 1 >> 7. Client gets reply to OPEN in 2, stateid is B.1 >> 8. Client sends LOCK with B.1 - OLD_STATEID, now stuck in a loop >> >> The CLOSE response in 6 causes us to clear NFS_OPEN_STATE, so that >> the OPEN >> response in 7 is able to update the open_stateid even though it has a >> lower >> sequence number. > > Hmm… We probably should not do that if the stateid.other field of A > (i.e. > the one supplied as the argument to CLOSE) does not match the > stateid.other of B. In fact, the reply in (4), where the stateid > changes > to B, should be the thing that resets the OPEN state. At 4, by reset the OPEN state, do you mean we should expect and wait for a pending close to complete, or we should start over with a brand new state? Maybe something else.. The reason the NFS_OPEN_STATE flag is cleared in 6 is that all the state counts (n_rdwr, n_rdonly, n_wronly) are zero. There could be a "pending open" flag or counter that would help 6 not clear NFS_OPEN_STATE.. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: CLOSE/OPEN race 2016-11-12 18:16 ` Trond Myklebust 2016-11-12 18:46 ` Benjamin Coddington @ 2016-11-13 3:09 ` Jeff Layton 1 sibling, 0 replies; 17+ messages in thread From: Jeff Layton @ 2016-11-13 3:09 UTC (permalink / raw) To: Trond Myklebust, Benjamin Coddington; +Cc: Linux List On Sat, 2016-11-12 at 18:16 +0000, Trond Myklebust wrote: > > On Nov 12, 2016, at 06:08, Benjamin Coddington <bcodding@redhat.com> wrote: > > > > I've been seeing the following on a modified version of generic/089 > > that gets the client stuck sending LOCK with NFS4ERR_OLD_STATEID. > > > > 1. Client has open stateid A, sends a CLOSE > > 2. Client sends OPEN with same owner > > 3. Client sends another OPEN with same owner > > 4. Client gets a reply to OPEN in 3, stateid is B.2 (stateid B sequence 2) > > 5. Client does LOCK,LOCKU,FREE_STATEID from B.2 > > 6. Client gets a reply to CLOSE in 1 > > 7. Client gets reply to OPEN in 2, stateid is B.1 > > 8. Client sends LOCK with B.1 - OLD_STATEID, now stuck in a loop > > > > The CLOSE response in 6 causes us to clear NFS_OPEN_STATE, so that the OPEN > > response in 7 is able to update the open_stateid even though it has a lower > > sequence number. > > Hmm… We probably should not do that if the stateid.other field of A (i.e. the one supplied as the argument to CLOSE) does not match the stateid.other of B. > In fact, the reply in (4), where the stateid changes to B, should be the thing that resets the OPEN state.NrybXǧv^){.n+{"^nrz\x1ah&\x1eGh\x03(階ݢj"\x1a^[mzޖfh~m It looks like that's already the case in nfs_clear_open_stateid_locked, though I don't think we ought to be doing anything with the stateid in the CLOSE response. I sent a draft patch in another part of this thread, but I don't quite see how that would cause this problem. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2016-11-14 18:40 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-11-12 11:08 CLOSE/OPEN race Benjamin Coddington 2016-11-12 12:54 ` Jeff Layton 2016-11-12 15:31 ` Benjamin Coddington 2016-11-12 16:52 ` Jeff Layton 2016-11-12 18:03 ` Benjamin Coddington 2016-11-12 21:16 ` Jeff Layton 2016-11-13 2:56 ` Jeff Layton 2016-11-13 13:34 ` Benjamin Coddington 2016-11-13 14:22 ` Benjamin Coddington 2016-11-13 14:33 ` Jeff Layton 2016-11-13 14:47 ` Trond Myklebust 2016-11-14 14:53 ` Benjamin Coddington 2016-11-14 16:29 ` Trond Myklebust 2016-11-14 18:40 ` Benjamin Coddington 2016-11-12 18:16 ` Trond Myklebust 2016-11-12 18:46 ` Benjamin Coddington 2016-11-13 3:09 ` Jeff Layton
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.