* [PATCH 0/4] Skipped unlocks @ 2017-02-17 18:37 Benjamin Coddington 2017-02-17 18:37 ` [PATCH 1/4] NFS4: remove a redundant lock range check Benjamin Coddington ` (3 more replies) 0 siblings, 4 replies; 11+ messages in thread From: Benjamin Coddington @ 2017-02-17 18:37 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs Well over a year ago I made other attempts to fix the problem of NFS failing to send an unlock when signalled. Those attempts were terrible. Here's another, smaller version that keeps two simple fixes and splits out the behavior for waiting on I/O completion to be different for the v3 and v4 case. For v4, we can easily ensure the unlock is always sent by having the rpc_task perform the wait. For v3, that approach would require significant changes to the way NLM works, so a simpler approach of skipping the wait is used as was suggested by Trond. I think this is probably a little late for 4.11. Comments and review are welcomed. Benjamin Coddington (4): NFS4: remove a redundant lock range check NFS: Move the flock open mode check into nfs_flock() locks: Set FL_CLOSE when removing flock locks on close() NFS: Always wait for I/O completion before unlock fs/locks.c | 2 +- fs/nfs/file.c | 33 +++++++++++++++++---------------- fs/nfs/nfs3proc.c | 13 +++++++++++++ fs/nfs/nfs4proc.c | 24 +++++++----------------- fs/nfs/pagelist.c | 1 + 5 files changed, 39 insertions(+), 34 deletions(-) -- 2.9.3 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/4] NFS4: remove a redundant lock range check 2017-02-17 18:37 [PATCH 0/4] Skipped unlocks Benjamin Coddington @ 2017-02-17 18:37 ` Benjamin Coddington 2017-02-17 18:37 ` [PATCH 2/4] NFS: Move the flock open mode check into nfs_flock() Benjamin Coddington ` (2 subsequent siblings) 3 siblings, 0 replies; 11+ messages in thread From: Benjamin Coddington @ 2017-02-17 18:37 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs flock64_to_posix_lock() is already doing this check Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> --- fs/nfs/nfs4proc.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 0a0eaecf9676..9388899e4050 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -6570,9 +6570,6 @@ nfs4_proc_lock(struct file *filp, int cmd, struct file_lock *request) ctx = nfs_file_open_context(filp); state = ctx->state; - if (request->fl_start < 0 || request->fl_end < 0) - return -EINVAL; - if (IS_GETLK(cmd)) { if (state != NULL) return nfs4_proc_getlk(state, F_GETLK, request); -- 2.9.3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/4] NFS: Move the flock open mode check into nfs_flock() 2017-02-17 18:37 [PATCH 0/4] Skipped unlocks Benjamin Coddington 2017-02-17 18:37 ` [PATCH 1/4] NFS4: remove a redundant lock range check Benjamin Coddington @ 2017-02-17 18:37 ` Benjamin Coddington 2017-02-17 18:37 ` [PATCH 3/4] locks: Set FL_CLOSE when removing flock locks on close() Benjamin Coddington 2017-02-17 18:37 ` [PATCH 4/4] NFS: Always wait for I/O completion before unlock Benjamin Coddington 3 siblings, 0 replies; 11+ messages in thread From: Benjamin Coddington @ 2017-02-17 18:37 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs We only need to check lock exclusive/shared types against open mode when flock() is used on NFS, so move it into the flock-specific path instead of checking it for all locks. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> --- fs/nfs/file.c | 20 +++++++++++++++++--- fs/nfs/nfs4proc.c | 14 -------------- 2 files changed, 17 insertions(+), 17 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 26dbe8b0c10d..a1499f3e8f7a 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -820,9 +820,23 @@ int nfs_flock(struct file *filp, int cmd, struct file_lock *fl) if (NFS_SERVER(inode)->flags & NFS_MOUNT_LOCAL_FLOCK) is_local = 1; - /* We're simulating flock() locks using posix locks on the server */ - if (fl->fl_type == F_UNLCK) - return do_unlk(filp, cmd, fl, is_local); + /* + * VFS doesn't require the open mode to match a flock() lock's type. + * NFS, however, may simulate flock() locking with posix locking which + * requires the open mode to match the lock type. + */ + switch (fl->fl_type) { + case F_UNLCK: + return do_unlk(filp, cmd, fl, is_local); + case F_RDLCK: + if (!(filp->f_mode & FMODE_READ)) + return -EBADF; + break; + case F_WRLCK: + if (!(filp->f_mode & FMODE_WRITE)) + return -EBADF; + } + return do_setlk(filp, cmd, fl, is_local); } EXPORT_SYMBOL_GPL(nfs_flock); diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 9388899e4050..91f88bfbbe79 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -6592,20 +6592,6 @@ nfs4_proc_lock(struct file *filp, int cmd, struct file_lock *request) !test_bit(NFS_STATE_POSIX_LOCKS, &state->flags)) return -ENOLCK; - /* - * Don't rely on the VFS having checked the file open mode, - * since it won't do this for flock() locks. - */ - switch (request->fl_type) { - case F_RDLCK: - if (!(filp->f_mode & FMODE_READ)) - return -EBADF; - break; - case F_WRLCK: - if (!(filp->f_mode & FMODE_WRITE)) - return -EBADF; - } - status = nfs4_set_lock_state(state, request); if (status != 0) return status; -- 2.9.3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/4] locks: Set FL_CLOSE when removing flock locks on close() 2017-02-17 18:37 [PATCH 0/4] Skipped unlocks Benjamin Coddington 2017-02-17 18:37 ` [PATCH 1/4] NFS4: remove a redundant lock range check Benjamin Coddington 2017-02-17 18:37 ` [PATCH 2/4] NFS: Move the flock open mode check into nfs_flock() Benjamin Coddington @ 2017-02-17 18:37 ` Benjamin Coddington 2017-02-17 18:37 ` [PATCH 4/4] NFS: Always wait for I/O completion before unlock Benjamin Coddington 3 siblings, 0 replies; 11+ messages in thread From: Benjamin Coddington @ 2017-02-17 18:37 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs Set FL_CLOSE in fl_flags as in locks_remove_posix() when clearing locks. NFS will check for this flag to ensure an unlock is sent in a following patch. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> --- fs/locks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/locks.c b/fs/locks.c index 26811321d39b..af2031a1fcff 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2504,7 +2504,7 @@ locks_remove_flock(struct file *filp, struct file_lock_context *flctx) .fl_owner = filp, .fl_pid = current->tgid, .fl_file = filp, - .fl_flags = FL_FLOCK, + .fl_flags = FL_FLOCK | FL_CLOSE, .fl_type = F_UNLCK, .fl_end = OFFSET_MAX, }; -- 2.9.3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 4/4] NFS: Always wait for I/O completion before unlock 2017-02-17 18:37 [PATCH 0/4] Skipped unlocks Benjamin Coddington ` (2 preceding siblings ...) 2017-02-17 18:37 ` [PATCH 3/4] locks: Set FL_CLOSE when removing flock locks on close() Benjamin Coddington @ 2017-02-17 18:37 ` Benjamin Coddington 3 siblings, 0 replies; 11+ messages in thread From: Benjamin Coddington @ 2017-02-17 18:37 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs NFS attempts to wait for read and write completion before unlocking in order to ensure that the data returned was protected by the lock. When this waiting is interrupted by a signal, the unlock may never be sent, and messages similar to the following are seen in the kernel ring buffer: [20.167876] Leaked locks on dev=0x0:0x2b ino=0x8dd4c3: [20.168286] POSIX: fl_owner=ffff880078b06940 fl_flags=0x1 fl_type=0x0 fl_pid=20183 [20.168727] POSIX: fl_owner=ffff880078b06680 fl_flags=0x1 fl_type=0x0 fl_pid=20185 For NFSv3, the missing unlock will cause the server to refuse conflicting locks indefinitely. For NFSv4, the leftover lock will be removed by the server after the lease timeout. This patch fixes this for NFSv3 by skipping the wait in order to immediately send the unlock if the FL_CLOSE flag is set when signaled. For NFSv4, this approach may cause the server to see the I/O as arriving with an old stateid, so, for the v4 case the fix is different: the wait on I/O completion is moved into nfs4_locku_ops' rpc_call_prepare(). This will cause the sleep to happen in rpciod context, and a signal to the originally waiting process will not cause the unlock to be skipped. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> --- fs/nfs/file.c | 13 ------------- fs/nfs/nfs3proc.c | 13 +++++++++++++ fs/nfs/nfs4proc.c | 7 +++++++ fs/nfs/pagelist.c | 1 + 4 files changed, 21 insertions(+), 13 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index a1499f3e8f7a..4707a35a5dea 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -684,7 +684,6 @@ static int do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local) { struct inode *inode = filp->f_mapping->host; - struct nfs_lock_context *l_ctx; int status; /* @@ -693,18 +692,6 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local) */ vfs_fsync(filp, 0); - l_ctx = nfs_get_lock_context(nfs_file_open_context(filp)); - if (!IS_ERR(l_ctx)) { - status = nfs_iocounter_wait(l_ctx); - nfs_put_lock_context(l_ctx); - if (status < 0) - return status; - } - - /* NOTE: special case - * If we're signalled while cleaning up locks on process exit, we - * still need to complete the unlock. - */ /* * Use local locking if mounted with "-onolock" or with appropriate * "-olocal_lock=" diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c index dc925b531f32..ec3f12571c82 100644 --- a/fs/nfs/nfs3proc.c +++ b/fs/nfs/nfs3proc.c @@ -869,6 +869,19 @@ static int nfs3_proc_lock(struct file *filp, int cmd, struct file_lock *fl) { struct inode *inode = file_inode(filp); + int status; + struct nfs_lock_context *l_ctx; + + /* For v3, always send the unlock on FL_CLOSE */ + if (fl->fl_type == F_UNLCK) { + l_ctx = nfs_get_lock_context(nfs_file_open_context(filp)); + if (!IS_ERR(l_ctx)) { + status = nfs_iocounter_wait(l_ctx); + nfs_put_lock_context(l_ctx); + if (status < 0 && !(fl->fl_flags & FL_CLOSE)) + return status; + } + } return nlmclnt_proc(NFS_SERVER(inode)->nlm_host, cmd, fl); } diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 91f88bfbbe79..052b97fd653d 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -5906,6 +5906,7 @@ struct nfs4_unlockdata { struct nfs_locku_res res; struct nfs4_lock_state *lsp; struct nfs_open_context *ctx; + struct nfs_lock_context *l_ctx; struct file_lock fl; struct nfs_server *server; unsigned long timestamp; @@ -5930,6 +5931,7 @@ static struct nfs4_unlockdata *nfs4_alloc_unlockdata(struct file_lock *fl, atomic_inc(&lsp->ls_count); /* Ensure we don't close file until we're done freeing locks! */ p->ctx = get_nfs_open_context(ctx); + p->l_ctx = nfs_get_lock_context(ctx); memcpy(&p->fl, fl, sizeof(p->fl)); p->server = NFS_SERVER(inode); return p; @@ -5988,6 +5990,11 @@ static void nfs4_locku_prepare(struct rpc_task *task, void *data) /* Note: exit _without_ running nfs4_locku_done */ goto out_no_action; } + + if (!IS_ERR(calldata->l_ctx)) { + nfs_iocounter_wait(calldata->l_ctx); + nfs_put_lock_context(calldata->l_ctx); + } calldata->timestamp = jiffies; if (nfs4_setup_sequence(calldata->server, &calldata->arg.seq_args, diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c index 6e629b856a00..8bf3cefdaa96 100644 --- a/fs/nfs/pagelist.c +++ b/fs/nfs/pagelist.c @@ -114,6 +114,7 @@ nfs_iocounter_wait(struct nfs_lock_context *l_ctx) return wait_on_atomic_t(&l_ctx->io_count, nfs_wait_atomic_killable, TASK_KILLABLE); } +EXPORT_SYMBOL(nfs_iocounter_wait); /* * nfs_page_group_lock - lock the head of the page group -- 2.9.3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 0/4] Skipped unlocks @ 2017-02-17 18:46 Benjamin Coddington 2017-02-17 18:46 ` [PATCH 4/4] NFS: Always wait for I/O completion before unlock Benjamin Coddington 0 siblings, 1 reply; 11+ messages in thread From: Benjamin Coddington @ 2017-02-17 18:46 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs Well over a year ago I made other attempts to fix the problem of NFS failing to send an unlock when signalled. Those attempts were terrible. Here's another, smaller version that keeps two simple fixes and splits out the behavior for waiting on I/O completion to be different for the v3 and v4 case. For v4, we can easily ensure the unlock is always sent by having the rpc_task perform the wait. For v3, that approach would require significant changes to the way NLM works, so a simpler approach of skipping the wait is used as was suggested by Trond. I think this is probably a little late for 4.11. Comments and review are welcomed. since v1: - add Christoph's reviewed-by on 1/4 and 2/4 and fixup switch indentation on 2/4 Benjamin Coddington (4): NFS4: remove a redundant lock range check NFS: Move the flock open mode check into nfs_flock() locks: Set FL_CLOSE when removing flock locks on close() NFS: Always wait for I/O completion before unlock fs/locks.c | 2 +- fs/nfs/file.c | 31 ++++++++++++++++--------------- fs/nfs/nfs3proc.c | 13 +++++++++++++ fs/nfs/nfs4proc.c | 24 +++++++----------------- fs/nfs/pagelist.c | 1 + 5 files changed, 38 insertions(+), 33 deletions(-) -- 2.9.3 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 4/4] NFS: Always wait for I/O completion before unlock 2017-02-17 18:46 [PATCH v2 0/4] Skipped unlocks Benjamin Coddington @ 2017-02-17 18:46 ` Benjamin Coddington 2017-02-17 19:00 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Benjamin Coddington @ 2017-02-17 18:46 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs NFS attempts to wait for read and write completion before unlocking in order to ensure that the data returned was protected by the lock. When this waiting is interrupted by a signal, the unlock may never be sent, and messages similar to the following are seen in the kernel ring buffer: [20.167876] Leaked locks on dev=0x0:0x2b ino=0x8dd4c3: [20.168286] POSIX: fl_owner=ffff880078b06940 fl_flags=0x1 fl_type=0x0 fl_pid=20183 [20.168727] POSIX: fl_owner=ffff880078b06680 fl_flags=0x1 fl_type=0x0 fl_pid=20185 For NFSv3, the missing unlock will cause the server to refuse conflicting locks indefinitely. For NFSv4, the leftover lock will be removed by the server after the lease timeout. This patch fixes this for NFSv3 by skipping the wait in order to immediately send the unlock if the FL_CLOSE flag is set when signaled. For NFSv4, this approach may cause the server to see the I/O as arriving with an old stateid, so, for the v4 case the fix is different: the wait on I/O completion is moved into nfs4_locku_ops' rpc_call_prepare(). This will cause the sleep to happen in rpciod context, and a signal to the originally waiting process will not cause the unlock to be skipped. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> --- fs/nfs/file.c | 13 ------------- fs/nfs/nfs3proc.c | 13 +++++++++++++ fs/nfs/nfs4proc.c | 7 +++++++ fs/nfs/pagelist.c | 1 + 4 files changed, 21 insertions(+), 13 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index a490f45df4db..adc67fe762e3 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -684,7 +684,6 @@ static int do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local) { struct inode *inode = filp->f_mapping->host; - struct nfs_lock_context *l_ctx; int status; /* @@ -693,18 +692,6 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local) */ vfs_fsync(filp, 0); - l_ctx = nfs_get_lock_context(nfs_file_open_context(filp)); - if (!IS_ERR(l_ctx)) { - status = nfs_iocounter_wait(l_ctx); - nfs_put_lock_context(l_ctx); - if (status < 0) - return status; - } - - /* NOTE: special case - * If we're signalled while cleaning up locks on process exit, we - * still need to complete the unlock. - */ /* * Use local locking if mounted with "-onolock" or with appropriate * "-olocal_lock=" diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c index dc925b531f32..ec3f12571c82 100644 --- a/fs/nfs/nfs3proc.c +++ b/fs/nfs/nfs3proc.c @@ -869,6 +869,19 @@ static int nfs3_proc_lock(struct file *filp, int cmd, struct file_lock *fl) { struct inode *inode = file_inode(filp); + int status; + struct nfs_lock_context *l_ctx; + + /* For v3, always send the unlock on FL_CLOSE */ + if (fl->fl_type == F_UNLCK) { + l_ctx = nfs_get_lock_context(nfs_file_open_context(filp)); + if (!IS_ERR(l_ctx)) { + status = nfs_iocounter_wait(l_ctx); + nfs_put_lock_context(l_ctx); + if (status < 0 && !(fl->fl_flags & FL_CLOSE)) + return status; + } + } return nlmclnt_proc(NFS_SERVER(inode)->nlm_host, cmd, fl); } diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 91f88bfbbe79..052b97fd653d 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -5906,6 +5906,7 @@ struct nfs4_unlockdata { struct nfs_locku_res res; struct nfs4_lock_state *lsp; struct nfs_open_context *ctx; + struct nfs_lock_context *l_ctx; struct file_lock fl; struct nfs_server *server; unsigned long timestamp; @@ -5930,6 +5931,7 @@ static struct nfs4_unlockdata *nfs4_alloc_unlockdata(struct file_lock *fl, atomic_inc(&lsp->ls_count); /* Ensure we don't close file until we're done freeing locks! */ p->ctx = get_nfs_open_context(ctx); + p->l_ctx = nfs_get_lock_context(ctx); memcpy(&p->fl, fl, sizeof(p->fl)); p->server = NFS_SERVER(inode); return p; @@ -5988,6 +5990,11 @@ static void nfs4_locku_prepare(struct rpc_task *task, void *data) /* Note: exit _without_ running nfs4_locku_done */ goto out_no_action; } + + if (!IS_ERR(calldata->l_ctx)) { + nfs_iocounter_wait(calldata->l_ctx); + nfs_put_lock_context(calldata->l_ctx); + } calldata->timestamp = jiffies; if (nfs4_setup_sequence(calldata->server, &calldata->arg.seq_args, diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c index 6e629b856a00..8bf3cefdaa96 100644 --- a/fs/nfs/pagelist.c +++ b/fs/nfs/pagelist.c @@ -114,6 +114,7 @@ nfs_iocounter_wait(struct nfs_lock_context *l_ctx) return wait_on_atomic_t(&l_ctx->io_count, nfs_wait_atomic_killable, TASK_KILLABLE); } +EXPORT_SYMBOL(nfs_iocounter_wait); /* * nfs_page_group_lock - lock the head of the page group -- 2.9.3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] NFS: Always wait for I/O completion before unlock 2017-02-17 18:46 ` [PATCH 4/4] NFS: Always wait for I/O completion before unlock Benjamin Coddington @ 2017-02-17 19:00 ` Trond Myklebust 2017-02-17 19:15 ` Benjamin Coddington 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2017-02-17 19:00 UTC (permalink / raw) To: bcodding, anna.schumaker; +Cc: linux-nfs T24gRnJpLCAyMDE3LTAyLTE3IGF0IDEzOjQ2IC0wNTAwLCBCZW5qYW1pbiBDb2RkaW5ndG9uIHdy b3RlOg0KPiBORlMgYXR0ZW1wdHMgdG8gd2FpdCBmb3IgcmVhZCBhbmQgd3JpdGUgY29tcGxldGlv biBiZWZvcmUgdW5sb2NraW5nDQo+IGluDQo+IG9yZGVyIHRvIGVuc3VyZSB0aGF0IHRoZSBkYXRh IHJldHVybmVkIHdhcyBwcm90ZWN0ZWQgYnkgdGhlDQo+IGxvY2suwqDCoFdoZW4NCj4gdGhpcyB3 YWl0aW5nIGlzIGludGVycnVwdGVkIGJ5IGEgc2lnbmFsLCB0aGUgdW5sb2NrIG1heSBuZXZlciBi ZQ0KPiBzZW50LCBhbmQNCj4gbWVzc2FnZXMgc2ltaWxhciB0byB0aGUgZm9sbG93aW5nIGFyZSBz ZWVuIGluIHRoZSBrZXJuZWwgcmluZyBidWZmZXI6DQo+IA0KPiBbMjAuMTY3ODc2XSBMZWFrZWQg bG9ja3Mgb24gZGV2PTB4MDoweDJiIGlubz0weDhkZDRjMzoNCj4gWzIwLjE2ODI4Nl0gUE9TSVg6 IGZsX293bmVyPWZmZmY4ODAwNzhiMDY5NDAgZmxfZmxhZ3M9MHgxIGZsX3R5cGU9MHgwDQo+IGZs X3BpZD0yMDE4Mw0KPiBbMjAuMTY4NzI3XSBQT1NJWDogZmxfb3duZXI9ZmZmZjg4MDA3OGIwNjY4 MCBmbF9mbGFncz0weDEgZmxfdHlwZT0weDANCj4gZmxfcGlkPTIwMTg1DQo+IA0KPiBGb3IgTkZT djMsIHRoZSBtaXNzaW5nIHVubG9jayB3aWxsIGNhdXNlIHRoZSBzZXJ2ZXIgdG8gcmVmdXNlDQo+ IGNvbmZsaWN0aW5nDQo+IGxvY2tzIGluZGVmaW5pdGVseS7CoMKgRm9yIE5GU3Y0LCB0aGUgbGVm dG92ZXIgbG9jayB3aWxsIGJlIHJlbW92ZWQgYnkNCj4gdGhlDQo+IHNlcnZlciBhZnRlciB0aGUg bGVhc2UgdGltZW91dC4NCj4gDQo+IFRoaXMgcGF0Y2ggZml4ZXMgdGhpcyBmb3IgTkZTdjMgYnkg c2tpcHBpbmcgdGhlIHdhaXQgaW4gb3JkZXIgdG8NCj4gaW1tZWRpYXRlbHkgc2VuZCB0aGUgdW5s b2NrIGlmIHRoZSBGTF9DTE9TRSBmbGFnIGlzIHNldCB3aGVuDQo+IHNpZ25hbGVkLsKgwqBGb3IN Cj4gTkZTdjQsIHRoaXMgYXBwcm9hY2ggbWF5IGNhdXNlIHRoZSBzZXJ2ZXIgdG8gc2VlIHRoZSBJ L08gYXMgYXJyaXZpbmcNCj4gd2l0aA0KPiBhbiBvbGQgc3RhdGVpZCwgc28sIGZvciB0aGUgdjQg Y2FzZSB0aGUgZml4IGlzIGRpZmZlcmVudDogdGhlIHdhaXQgb24NCj4gSS9PDQo+IGNvbXBsZXRp b24gaXMgbW92ZWQgaW50byBuZnM0X2xvY2t1X29wcycgcnBjX2NhbGxfcHJlcGFyZSgpLsKgwqBU aGlzDQo+IHdpbGwNCj4gY2F1c2UgdGhlIHNsZWVwIHRvIGhhcHBlbiBpbiBycGNpb2QgY29udGV4 dCwgYW5kIGEgc2lnbmFsIHRvIHRoZQ0KPiBvcmlnaW5hbGx5DQo+IHdhaXRpbmcgcHJvY2VzcyB3 aWxsIG5vdCBjYXVzZSB0aGUgdW5sb2NrIHRvIGJlIHNraXBwZWQuDQoNCk5BQ0suIEkvTyB3YWl0 cyBpbiBycGNpb2QgY29udGV4dHMgYXJlIE5PVCBhY2NlcHRhYmxlLiBycGNpb2QgaXMgcGFydA0K b2YgdGhlIG1lbW9yeSByZWNsYWltIGNoYWluLCBzbyBoYXZpbmcgaXQgc2xlZXAgb24gSS9PIGlz IGRlYWRsb2NrDQpwcm9uZS4NCg0KV2h5IGlzIHRoZXJlIGEgbmVlZCB0byB3YWl0IGZvciBJL08g Y29tcGxldGlvbiBpbiB0aGUgZmlyc3QgcGxhY2UgaWYNCnRoZSB1c2VyIGhhcyBraWxsZWQgdGhl IHRhc2sgdGhhdCBoZWxkIHRoZSBsb2NrPyAna2lsbCAtOScgd2lsbCBjYXVzZQ0KY29ycnVwdGlv bjsgdGhhdCdzIGEgZmFjdCB0aGF0IG5vIGFtb3VudCBvZiBwYXBlciB3aWxsIGNvdmVyIG92ZXIu DQoNCj4gDQo+IFNpZ25lZC1vZmYtYnk6IEJlbmphbWluIENvZGRpbmd0b24gPGJjb2RkaW5nQHJl ZGhhdC5jb20+DQo+IC0tLQ0KPiDCoGZzL25mcy9maWxlLmPCoMKgwqDCoMKgfCAxMyAtLS0tLS0t LS0tLS0tDQo+IMKgZnMvbmZzL25mczNwcm9jLmMgfCAxMyArKysrKysrKysrKysrDQo+IMKgZnMv bmZzL25mczRwcm9jLmMgfMKgwqA3ICsrKysrKysNCj4gwqBmcy9uZnMvcGFnZWxpc3QuYyB8wqDC oDEgKw0KPiDCoDQgZmlsZXMgY2hhbmdlZCwgMjEgaW5zZXJ0aW9ucygrKSwgMTMgZGVsZXRpb25z KC0pDQo+IA0KPiBkaWZmIC0tZ2l0IGEvZnMvbmZzL2ZpbGUuYyBiL2ZzL25mcy9maWxlLmMNCj4g aW5kZXggYTQ5MGY0NWRmNGRiLi5hZGM2N2ZlNzYyZTMgMTAwNjQ0DQo+IC0tLSBhL2ZzL25mcy9m aWxlLmMNCj4gKysrIGIvZnMvbmZzL2ZpbGUuYw0KPiBAQCAtNjg0LDcgKzY4NCw2IEBAIHN0YXRp YyBpbnQNCj4gwqBkb191bmxrKHN0cnVjdCBmaWxlICpmaWxwLCBpbnQgY21kLCBzdHJ1Y3QgZmls ZV9sb2NrICpmbCwgaW50DQo+IGlzX2xvY2FsKQ0KPiDCoHsNCj4gwqAJc3RydWN0IGlub2RlICpp bm9kZSA9IGZpbHAtPmZfbWFwcGluZy0+aG9zdDsNCj4gLQlzdHJ1Y3QgbmZzX2xvY2tfY29udGV4 dCAqbF9jdHg7DQo+IMKgCWludCBzdGF0dXM7DQo+IMKgDQo+IMKgCS8qDQo+IEBAIC02OTMsMTgg KzY5Miw2IEBAIGRvX3VubGsoc3RydWN0IGZpbGUgKmZpbHAsIGludCBjbWQsIHN0cnVjdA0KPiBm aWxlX2xvY2sgKmZsLCBpbnQgaXNfbG9jYWwpDQo+IMKgCcKgKi8NCj4gwqAJdmZzX2ZzeW5jKGZp bHAsIDApOw0KPiDCoA0KPiAtCWxfY3R4ID0gbmZzX2dldF9sb2NrX2NvbnRleHQobmZzX2ZpbGVf b3Blbl9jb250ZXh0KGZpbHApKTsNCj4gLQlpZiAoIUlTX0VSUihsX2N0eCkpIHsNCj4gLQkJc3Rh dHVzID0gbmZzX2lvY291bnRlcl93YWl0KGxfY3R4KTsNCj4gLQkJbmZzX3B1dF9sb2NrX2NvbnRl eHQobF9jdHgpOw0KPiAtCQlpZiAoc3RhdHVzIDwgMCkNCj4gLQkJCXJldHVybiBzdGF0dXM7DQo+ IC0JfQ0KPiAtDQo+IC0JLyogTk9URTogc3BlY2lhbCBjYXNlDQo+IC0JwqAqwqAJSWYgd2UncmUg c2lnbmFsbGVkIHdoaWxlIGNsZWFuaW5nIHVwIGxvY2tzIG9uDQo+IHByb2Nlc3MgZXhpdCwgd2UN Cj4gLQnCoCrCoAlzdGlsbCBuZWVkIHRvIGNvbXBsZXRlIHRoZSB1bmxvY2suDQo+IC0JwqAqLw0K PiDCoAkvKg0KPiDCoAnCoCogVXNlIGxvY2FsIGxvY2tpbmcgaWYgbW91bnRlZCB3aXRoICItb25v bG9jayIgb3Igd2l0aA0KPiBhcHByb3ByaWF0ZQ0KPiDCoAnCoCogIi1vbG9jYWxfbG9jaz0iDQo+ IGRpZmYgLS1naXQgYS9mcy9uZnMvbmZzM3Byb2MuYyBiL2ZzL25mcy9uZnMzcHJvYy5jDQo+IGlu ZGV4IGRjOTI1YjUzMWYzMi4uZWMzZjEyNTcxYzgyIDEwMDY0NA0KPiAtLS0gYS9mcy9uZnMvbmZz M3Byb2MuYw0KPiArKysgYi9mcy9uZnMvbmZzM3Byb2MuYw0KPiBAQCAtODY5LDYgKzg2OSwxOSBA QCBzdGF0aWMgaW50DQo+IMKgbmZzM19wcm9jX2xvY2soc3RydWN0IGZpbGUgKmZpbHAsIGludCBj bWQsIHN0cnVjdCBmaWxlX2xvY2sgKmZsKQ0KPiDCoHsNCj4gwqAJc3RydWN0IGlub2RlICppbm9k ZSA9IGZpbGVfaW5vZGUoZmlscCk7DQo+ICsJaW50IHN0YXR1czsNCj4gKwlzdHJ1Y3QgbmZzX2xv Y2tfY29udGV4dCAqbF9jdHg7DQo+ICsNCj4gKwkvKiBGb3IgdjMsIGFsd2F5cyBzZW5kIHRoZSB1 bmxvY2sgb24gRkxfQ0xPU0UgKi8NCj4gKwlpZiAoZmwtPmZsX3R5cGUgPT0gRl9VTkxDSykgew0K PiArCQlsX2N0eCA9DQo+IG5mc19nZXRfbG9ja19jb250ZXh0KG5mc19maWxlX29wZW5fY29udGV4 dChmaWxwKSk7DQo+ICsJCWlmICghSVNfRVJSKGxfY3R4KSkgew0KPiArCQkJc3RhdHVzID0gbmZz X2lvY291bnRlcl93YWl0KGxfY3R4KTsNCj4gKwkJCW5mc19wdXRfbG9ja19jb250ZXh0KGxfY3R4 KTsNCj4gKwkJCWlmIChzdGF0dXMgPCAwICYmICEoZmwtPmZsX2ZsYWdzICYNCj4gRkxfQ0xPU0Up KQ0KPiArCQkJCXJldHVybiBzdGF0dXM7DQo+ICsJCX0NCj4gKwl9DQo+IMKgDQo+IMKgCXJldHVy biBubG1jbG50X3Byb2MoTkZTX1NFUlZFUihpbm9kZSktPm5sbV9ob3N0LCBjbWQsIGZsKTsNCj4g wqB9DQo+IGRpZmYgLS1naXQgYS9mcy9uZnMvbmZzNHByb2MuYyBiL2ZzL25mcy9uZnM0cHJvYy5j DQo+IGluZGV4IDkxZjg4YmZiYmU3OS4uMDUyYjk3ZmQ2NTNkIDEwMDY0NA0KPiAtLS0gYS9mcy9u ZnMvbmZzNHByb2MuYw0KPiArKysgYi9mcy9uZnMvbmZzNHByb2MuYw0KPiBAQCAtNTkwNiw2ICs1 OTA2LDcgQEAgc3RydWN0IG5mczRfdW5sb2NrZGF0YSB7DQo+IMKgCXN0cnVjdCBuZnNfbG9ja3Vf cmVzIHJlczsNCj4gwqAJc3RydWN0IG5mczRfbG9ja19zdGF0ZSAqbHNwOw0KPiDCoAlzdHJ1Y3Qg bmZzX29wZW5fY29udGV4dCAqY3R4Ow0KPiArCXN0cnVjdCBuZnNfbG9ja19jb250ZXh0ICpsX2N0 eDsNCj4gwqAJc3RydWN0IGZpbGVfbG9jayBmbDsNCj4gwqAJc3RydWN0IG5mc19zZXJ2ZXIgKnNl cnZlcjsNCj4gwqAJdW5zaWduZWQgbG9uZyB0aW1lc3RhbXA7DQo+IEBAIC01OTMwLDYgKzU5MzEs NyBAQCBzdGF0aWMgc3RydWN0IG5mczRfdW5sb2NrZGF0YQ0KPiAqbmZzNF9hbGxvY191bmxvY2tk YXRhKHN0cnVjdCBmaWxlX2xvY2sgKmZsLA0KPiDCoAlhdG9taWNfaW5jKCZsc3AtPmxzX2NvdW50 KTsNCj4gwqAJLyogRW5zdXJlIHdlIGRvbid0IGNsb3NlIGZpbGUgdW50aWwgd2UncmUgZG9uZSBm cmVlaW5nDQo+IGxvY2tzISAqLw0KPiDCoAlwLT5jdHggPSBnZXRfbmZzX29wZW5fY29udGV4dChj dHgpOw0KPiArCXAtPmxfY3R4ID0gbmZzX2dldF9sb2NrX2NvbnRleHQoY3R4KTsNCj4gwqAJbWVt Y3B5KCZwLT5mbCwgZmwsIHNpemVvZihwLT5mbCkpOw0KPiDCoAlwLT5zZXJ2ZXIgPSBORlNfU0VS VkVSKGlub2RlKTsNCj4gwqAJcmV0dXJuIHA7DQo+IEBAIC01OTg4LDYgKzU5OTAsMTEgQEAgc3Rh dGljIHZvaWQgbmZzNF9sb2NrdV9wcmVwYXJlKHN0cnVjdCBycGNfdGFzaw0KPiAqdGFzaywgdm9p ZCAqZGF0YSkNCj4gwqAJCS8qIE5vdGU6IGV4aXQgX3dpdGhvdXRfIHJ1bm5pbmcgbmZzNF9sb2Nr dV9kb25lICovDQo+IMKgCQlnb3RvIG91dF9ub19hY3Rpb247DQo+IMKgCX0NCj4gKw0KPiArCWlm ICghSVNfRVJSKGNhbGxkYXRhLT5sX2N0eCkpIHsNCj4gKwkJbmZzX2lvY291bnRlcl93YWl0KGNh bGxkYXRhLT5sX2N0eCk7DQo+ICsJCW5mc19wdXRfbG9ja19jb250ZXh0KGNhbGxkYXRhLT5sX2N0 eCk7DQo+ICsJfQ0KPiDCoAljYWxsZGF0YS0+dGltZXN0YW1wID0gamlmZmllczsNCj4gwqAJaWYg KG5mczRfc2V0dXBfc2VxdWVuY2UoY2FsbGRhdGEtPnNlcnZlciwNCj4gwqAJCQkJJmNhbGxkYXRh LT5hcmcuc2VxX2FyZ3MsDQo+IGRpZmYgLS1naXQgYS9mcy9uZnMvcGFnZWxpc3QuYyBiL2ZzL25m cy9wYWdlbGlzdC5jDQo+IGluZGV4IDZlNjI5Yjg1NmEwMC4uOGJmM2NlZmRhYTk2IDEwMDY0NA0K PiAtLS0gYS9mcy9uZnMvcGFnZWxpc3QuYw0KPiArKysgYi9mcy9uZnMvcGFnZWxpc3QuYw0KPiBA QCAtMTE0LDYgKzExNCw3IEBAIG5mc19pb2NvdW50ZXJfd2FpdChzdHJ1Y3QgbmZzX2xvY2tfY29u dGV4dA0KPiAqbF9jdHgpDQo+IMKgCXJldHVybiB3YWl0X29uX2F0b21pY190KCZsX2N0eC0+aW9f Y291bnQsDQo+IG5mc193YWl0X2F0b21pY19raWxsYWJsZSwNCj4gwqAJCQlUQVNLX0tJTExBQkxF KTsNCj4gwqB9DQo+ICtFWFBPUlRfU1lNQk9MKG5mc19pb2NvdW50ZXJfd2FpdCk7DQo+IMKgDQo+ IMKgLyoNCj4gwqAgKiBuZnNfcGFnZV9ncm91cF9sb2NrIC0gbG9jayB0aGUgaGVhZCBvZiB0aGUg cGFnZSBncm91cA0KLS0gDQoNCg0KDQoJDQoJDQoNCg0KVHJvbmQgTXlrbGVidXN0DQpQcmluY2lw YWwgU3lzdGVtIEFyY2hpdGVjdA0KNDMwMCBFbCBDYW1pbm8gUmVhbCB8IFN1aXRlIDEwMA0KTG9z IEFsdG9zLCBDQcKgwqA5NDAyMg0KVzogNjUwLTQyMi0zODAwDQpDOiA4MDEtOTIxLTQ1ODPCoA0K d3d3LnByaW1hcnlkYXRhLmNvbQ0KDQoNCg0KDQo= ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] NFS: Always wait for I/O completion before unlock 2017-02-17 19:00 ` Trond Myklebust @ 2017-02-17 19:15 ` Benjamin Coddington 2017-02-17 19:30 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Benjamin Coddington @ 2017-02-17 19:15 UTC (permalink / raw) To: Trond Myklebust; +Cc: anna.schumaker, linux-nfs On 17 Feb 2017, at 14:00, Trond Myklebust wrote: > On Fri, 2017-02-17 at 13:46 -0500, Benjamin Coddington wrote: >> NFS attempts to wait for read and write completion before unlocking >> in >> order to ensure that the data returned was protected by the >> lock. When >> this waiting is interrupted by a signal, the unlock may never be >> sent, and >> messages similar to the following are seen in the kernel ring buffer: >> >> [20.167876] Leaked locks on dev=0x0:0x2b ino=0x8dd4c3: >> [20.168286] POSIX: fl_owner=ffff880078b06940 fl_flags=0x1 fl_type=0x0 >> fl_pid=20183 >> [20.168727] POSIX: fl_owner=ffff880078b06680 fl_flags=0x1 fl_type=0x0 >> fl_pid=20185 >> >> For NFSv3, the missing unlock will cause the server to refuse >> conflicting >> locks indefinitely. For NFSv4, the leftover lock will be removed by >> the >> server after the lease timeout. >> >> This patch fixes this for NFSv3 by skipping the wait in order to >> immediately send the unlock if the FL_CLOSE flag is set when >> signaled. For >> NFSv4, this approach may cause the server to see the I/O as arriving >> with >> an old stateid, so, for the v4 case the fix is different: the wait on >> I/O >> completion is moved into nfs4_locku_ops' rpc_call_prepare(). This >> will >> cause the sleep to happen in rpciod context, and a signal to the >> originally >> waiting process will not cause the unlock to be skipped. > > NACK. I/O waits in rpciod contexts are NOT acceptable. rpciod is part > of the memory reclaim chain, so having it sleep on I/O is deadlock > prone. > > Why is there a need to wait for I/O completion in the first place if > the user has killed the task that held the lock? 'kill -9' will cause > corruption; that's a fact that no amount of paper will cover over. To avoid an unnecessary recovery situation where the server asks us to resend I/O due to an invalid stateid. I'm fine with skipping the wait on signaled FL_CLOSE if the that's an acceptable trade-off. I can send a v3. Ben >> >> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> >> --- >> fs/nfs/file.c | 13 ------------- >> fs/nfs/nfs3proc.c | 13 +++++++++++++ >> fs/nfs/nfs4proc.c | 7 +++++++ >> fs/nfs/pagelist.c | 1 + >> 4 files changed, 21 insertions(+), 13 deletions(-) >> >> diff --git a/fs/nfs/file.c b/fs/nfs/file.c >> index a490f45df4db..adc67fe762e3 100644 >> --- a/fs/nfs/file.c >> +++ b/fs/nfs/file.c >> @@ -684,7 +684,6 @@ static int >> do_unlk(struct file *filp, int cmd, struct file_lock *fl, int >> is_local) >> { >> struct inode *inode = filp->f_mapping->host; >> - struct nfs_lock_context *l_ctx; >> int status; >> >> /* >> @@ -693,18 +692,6 @@ do_unlk(struct file *filp, int cmd, struct >> file_lock *fl, int is_local) >> */ >> vfs_fsync(filp, 0); >> >> - l_ctx = nfs_get_lock_context(nfs_file_open_context(filp)); >> - if (!IS_ERR(l_ctx)) { >> - status = nfs_iocounter_wait(l_ctx); >> - nfs_put_lock_context(l_ctx); >> - if (status < 0) >> - return status; >> - } >> - >> - /* NOTE: special case >> - * If we're signalled while cleaning up locks on >> process exit, we >> - * still need to complete the unlock. >> - */ >> /* >> * Use local locking if mounted with "-onolock" or with >> appropriate >> * "-olocal_lock=" >> diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c >> index dc925b531f32..ec3f12571c82 100644 >> --- a/fs/nfs/nfs3proc.c >> +++ b/fs/nfs/nfs3proc.c >> @@ -869,6 +869,19 @@ static int >> nfs3_proc_lock(struct file *filp, int cmd, struct file_lock *fl) >> { >> struct inode *inode = file_inode(filp); >> + int status; >> + struct nfs_lock_context *l_ctx; >> + >> + /* For v3, always send the unlock on FL_CLOSE */ >> + if (fl->fl_type == F_UNLCK) { >> + l_ctx = >> nfs_get_lock_context(nfs_file_open_context(filp)); >> + if (!IS_ERR(l_ctx)) { >> + status = nfs_iocounter_wait(l_ctx); >> + nfs_put_lock_context(l_ctx); >> + if (status < 0 && !(fl->fl_flags & >> FL_CLOSE)) >> + return status; >> + } >> + } >> >> return nlmclnt_proc(NFS_SERVER(inode)->nlm_host, cmd, fl); >> } >> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >> index 91f88bfbbe79..052b97fd653d 100644 >> --- a/fs/nfs/nfs4proc.c >> +++ b/fs/nfs/nfs4proc.c >> @@ -5906,6 +5906,7 @@ struct nfs4_unlockdata { >> struct nfs_locku_res res; >> struct nfs4_lock_state *lsp; >> struct nfs_open_context *ctx; >> + struct nfs_lock_context *l_ctx; >> struct file_lock fl; >> struct nfs_server *server; >> unsigned long timestamp; >> @@ -5930,6 +5931,7 @@ static struct nfs4_unlockdata >> *nfs4_alloc_unlockdata(struct file_lock *fl, >> atomic_inc(&lsp->ls_count); >> /* Ensure we don't close file until we're done freeing >> locks! */ >> p->ctx = get_nfs_open_context(ctx); >> + p->l_ctx = nfs_get_lock_context(ctx); >> memcpy(&p->fl, fl, sizeof(p->fl)); >> p->server = NFS_SERVER(inode); >> return p; >> @@ -5988,6 +5990,11 @@ static void nfs4_locku_prepare(struct rpc_task >> *task, void *data) >> /* Note: exit _without_ running nfs4_locku_done */ >> goto out_no_action; >> } >> + >> + if (!IS_ERR(calldata->l_ctx)) { >> + nfs_iocounter_wait(calldata->l_ctx); >> + nfs_put_lock_context(calldata->l_ctx); >> + } >> calldata->timestamp = jiffies; >> if (nfs4_setup_sequence(calldata->server, >> &calldata->arg.seq_args, >> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c >> index 6e629b856a00..8bf3cefdaa96 100644 >> --- a/fs/nfs/pagelist.c >> +++ b/fs/nfs/pagelist.c >> @@ -114,6 +114,7 @@ nfs_iocounter_wait(struct nfs_lock_context >> *l_ctx) >> return wait_on_atomic_t(&l_ctx->io_count, >> nfs_wait_atomic_killable, >> TASK_KILLABLE); >> } >> +EXPORT_SYMBOL(nfs_iocounter_wait); >> >> /* >> * nfs_page_group_lock - lock the head of the page group > -- > > > > > > > > Trond Myklebust > Principal System Architect > 4300 El Camino Real | Suite 100 > Los Altos, CA 94022 > W: 650-422-3800 > C: 801-921-4583 > www.primarydata.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] NFS: Always wait for I/O completion before unlock 2017-02-17 19:15 ` Benjamin Coddington @ 2017-02-17 19:30 ` Trond Myklebust 2017-02-17 20:10 ` Benjamin Coddington 2017-02-21 13:56 ` Benjamin Coddington 0 siblings, 2 replies; 11+ messages in thread From: Trond Myklebust @ 2017-02-17 19:30 UTC (permalink / raw) To: bcodding; +Cc: anna.schumaker, linux-nfs T24gRnJpLCAyMDE3LTAyLTE3IGF0IDE0OjE1IC0wNTAwLCBCZW5qYW1pbiBDb2RkaW5ndG9uIHdy b3RlOg0KPiBPbiAxNyBGZWIgMjAxNywgYXQgMTQ6MDAsIFRyb25kIE15a2xlYnVzdCB3cm90ZToN Cj4gDQo+ID4gT24gRnJpLCAyMDE3LTAyLTE3IGF0IDEzOjQ2IC0wNTAwLCBCZW5qYW1pbiBDb2Rk aW5ndG9uIHdyb3RlOg0KPiA+ID4gTkZTIGF0dGVtcHRzIHRvIHdhaXQgZm9yIHJlYWQgYW5kIHdy aXRlIGNvbXBsZXRpb24gYmVmb3JlDQo+ID4gPiB1bmxvY2tpbmcNCj4gPiA+IGluDQo+ID4gPiBv cmRlciB0byBlbnN1cmUgdGhhdCB0aGUgZGF0YSByZXR1cm5lZCB3YXMgcHJvdGVjdGVkIGJ5IHRo ZQ0KPiA+ID4gbG9jay7CoMKgV2hlbg0KPiA+ID4gdGhpcyB3YWl0aW5nIGlzIGludGVycnVwdGVk IGJ5IGEgc2lnbmFsLCB0aGUgdW5sb2NrIG1heSBuZXZlciBiZQ0KPiA+ID4gc2VudCwgYW5kDQo+ ID4gPiBtZXNzYWdlcyBzaW1pbGFyIHRvIHRoZSBmb2xsb3dpbmcgYXJlIHNlZW4gaW4gdGhlIGtl cm5lbCByaW5nDQo+ID4gPiBidWZmZXI6DQo+ID4gPiANCj4gPiA+IFsyMC4xNjc4NzZdIExlYWtl ZCBsb2NrcyBvbiBkZXY9MHgwOjB4MmIgaW5vPTB4OGRkNGMzOg0KPiA+ID4gWzIwLjE2ODI4Nl0g UE9TSVg6IGZsX293bmVyPWZmZmY4ODAwNzhiMDY5NDAgZmxfZmxhZ3M9MHgxDQo+ID4gPiBmbF90 eXBlPTB4MA0KPiA+ID4gZmxfcGlkPTIwMTgzDQo+ID4gPiBbMjAuMTY4NzI3XSBQT1NJWDogZmxf b3duZXI9ZmZmZjg4MDA3OGIwNjY4MCBmbF9mbGFncz0weDENCj4gPiA+IGZsX3R5cGU9MHgwDQo+ ID4gPiBmbF9waWQ9MjAxODUNCj4gPiA+IA0KPiA+ID4gRm9yIE5GU3YzLCB0aGUgbWlzc2luZyB1 bmxvY2sgd2lsbCBjYXVzZSB0aGUgc2VydmVyIHRvIHJlZnVzZQ0KPiA+ID4gY29uZmxpY3RpbmcN Cj4gPiA+IGxvY2tzIGluZGVmaW5pdGVseS7CoMKgRm9yIE5GU3Y0LCB0aGUgbGVmdG92ZXIgbG9j ayB3aWxsIGJlIHJlbW92ZWQNCj4gPiA+IGJ5DQo+ID4gPiB0aGUNCj4gPiA+IHNlcnZlciBhZnRl ciB0aGUgbGVhc2UgdGltZW91dC4NCj4gPiA+IA0KPiA+ID4gVGhpcyBwYXRjaCBmaXhlcyB0aGlz IGZvciBORlN2MyBieSBza2lwcGluZyB0aGUgd2FpdCBpbiBvcmRlciB0bw0KPiA+ID4gaW1tZWRp YXRlbHkgc2VuZCB0aGUgdW5sb2NrIGlmIHRoZSBGTF9DTE9TRSBmbGFnIGlzIHNldCB3aGVuDQo+ ID4gPiBzaWduYWxlZC7CoMKgRm9yDQo+ID4gPiBORlN2NCwgdGhpcyBhcHByb2FjaCBtYXkgY2F1 c2UgdGhlIHNlcnZlciB0byBzZWUgdGhlIEkvTyBhcw0KPiA+ID4gYXJyaXZpbmcNCj4gPiA+IHdp dGgNCj4gPiA+IGFuIG9sZCBzdGF0ZWlkLCBzbywgZm9yIHRoZSB2NCBjYXNlIHRoZSBmaXggaXMg ZGlmZmVyZW50OiB0aGUNCj4gPiA+IHdhaXQgb24NCj4gPiA+IEkvTw0KPiA+ID4gY29tcGxldGlv biBpcyBtb3ZlZCBpbnRvIG5mczRfbG9ja3Vfb3BzJw0KPiA+ID4gcnBjX2NhbGxfcHJlcGFyZSgp LsKgwqBUaGlzDQo+ID4gPiB3aWxsDQo+ID4gPiBjYXVzZSB0aGUgc2xlZXAgdG8gaGFwcGVuIGlu IHJwY2lvZCBjb250ZXh0LCBhbmQgYSBzaWduYWwgdG8gdGhlDQo+ID4gPiBvcmlnaW5hbGx5DQo+ ID4gPiB3YWl0aW5nIHByb2Nlc3Mgd2lsbCBub3QgY2F1c2UgdGhlIHVubG9jayB0byBiZSBza2lw cGVkLg0KPiA+IA0KPiA+IE5BQ0suIEkvTyB3YWl0cyBpbiBycGNpb2QgY29udGV4dHMgYXJlIE5P VCBhY2NlcHRhYmxlLiBycGNpb2QgaXMNCj4gPiBwYXJ0DQo+ID4gb2YgdGhlIG1lbW9yeSByZWNs YWltIGNoYWluLCBzbyBoYXZpbmcgaXQgc2xlZXAgb24gSS9PIGlzIGRlYWRsb2NrDQo+ID4gcHJv bmUuDQo+ID4gDQo+ID4gV2h5IGlzIHRoZXJlIGEgbmVlZCB0byB3YWl0IGZvciBJL08gY29tcGxl dGlvbiBpbiB0aGUgZmlyc3QgcGxhY2UNCj4gPiBpZg0KPiA+IHRoZSB1c2VyIGhhcyBraWxsZWQg dGhlIHRhc2sgdGhhdCBoZWxkIHRoZSBsb2NrPyAna2lsbCAtOScgd2lsbA0KPiA+IGNhdXNlDQo+ ID4gY29ycnVwdGlvbjsgdGhhdCdzIGEgZmFjdCB0aGF0IG5vIGFtb3VudCBvZiBwYXBlciB3aWxs IGNvdmVyIG92ZXIuDQo+IA0KPiBUbyBhdm9pZCBhbiB1bm5lY2Vzc2FyeSByZWNvdmVyeSBzaXR1 YXRpb24gd2hlcmUgdGhlIHNlcnZlciBhc2tzIHVzDQo+IHRvIHJlc2VuZA0KPiBJL08gZHVlIHRv IGFuIGludmFsaWQgc3RhdGVpZC4NCj4gDQoNCkkgYWdyZWUgd2Ugc2hvdWxkbid0IHJlY292ZXIg aW4gdGhpcyBzaXR1YXRpb24uIEl0IHdvdWxkIGJlIGJldHRlciB0bw0KamV0dGlzb24gdGhlIGZh aWxlZCB3cml0ZSwgYW5kIGludmFsaWRhdGUgdGhlIHBhZ2UuIENhbiB3ZSBtYWtlIHVzZSBvZg0K bmZzX3diX3BhZ2VfY2FuY2VsKCkgdG9nZXRoZXIgd2l0aCBnZW5lcmljX2Vycm9yX3JlbW92ZV9w YWdlKCk/DQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWlu ZXIsIFByaW1hcnlEYXRhDQp0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tDQo= ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] NFS: Always wait for I/O completion before unlock 2017-02-17 19:30 ` Trond Myklebust @ 2017-02-17 20:10 ` Benjamin Coddington 2017-02-21 13:56 ` Benjamin Coddington 1 sibling, 0 replies; 11+ messages in thread From: Benjamin Coddington @ 2017-02-17 20:10 UTC (permalink / raw) To: Trond Myklebust; +Cc: anna.schumaker, linux-nfs On 17 Feb 2017, at 14:30, Trond Myklebust wrote: > On Fri, 2017-02-17 at 14:15 -0500, Benjamin Coddington wrote: >> To avoid an unnecessary recovery situation where the server asks us to >> resend I/O due to an invalid stateid. > > I agree we shouldn't recover in this situation. It would be better to > jettison the failed write, and invalidate the page. Can we make use of > nfs_wb_page_cancel() together with generic_error_remove_page()? I'll take a look.. Ben ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] NFS: Always wait for I/O completion before unlock 2017-02-17 19:30 ` Trond Myklebust 2017-02-17 20:10 ` Benjamin Coddington @ 2017-02-21 13:56 ` Benjamin Coddington 1 sibling, 0 replies; 11+ messages in thread From: Benjamin Coddington @ 2017-02-21 13:56 UTC (permalink / raw) To: Trond Myklebust; +Cc: anna.schumaker, linux-nfs On 17 Feb 2017, at 14:30, Trond Myklebust wrote: > On Fri, 2017-02-17 at 14:15 -0500, Benjamin Coddington wrote: >> On 17 Feb 2017, at 14:00, Trond Myklebust wrote: >> >>> On Fri, 2017-02-17 at 13:46 -0500, Benjamin Coddington wrote: >>>> NFS attempts to wait for read and write completion before >>>> unlocking >>>> in >>>> order to ensure that the data returned was protected by the >>>> lock. When >>>> this waiting is interrupted by a signal, the unlock may never be >>>> sent, and >>>> messages similar to the following are seen in the kernel ring >>>> buffer: >>>> >>>> [20.167876] Leaked locks on dev=0x0:0x2b ino=0x8dd4c3: >>>> [20.168286] POSIX: fl_owner=ffff880078b06940 fl_flags=0x1 >>>> fl_type=0x0 >>>> fl_pid=20183 >>>> [20.168727] POSIX: fl_owner=ffff880078b06680 fl_flags=0x1 >>>> fl_type=0x0 >>>> fl_pid=20185 >>>> >>>> For NFSv3, the missing unlock will cause the server to refuse >>>> conflicting >>>> locks indefinitely. For NFSv4, the leftover lock will be removed >>>> by >>>> the >>>> server after the lease timeout. >>>> >>>> This patch fixes this for NFSv3 by skipping the wait in order to >>>> immediately send the unlock if the FL_CLOSE flag is set when >>>> signaled. For >>>> NFSv4, this approach may cause the server to see the I/O as >>>> arriving >>>> with >>>> an old stateid, so, for the v4 case the fix is different: the >>>> wait on >>>> I/O >>>> completion is moved into nfs4_locku_ops' >>>> rpc_call_prepare(). This >>>> will >>>> cause the sleep to happen in rpciod context, and a signal to the >>>> originally >>>> waiting process will not cause the unlock to be skipped. >>> >>> NACK. I/O waits in rpciod contexts are NOT acceptable. rpciod is >>> part >>> of the memory reclaim chain, so having it sleep on I/O is deadlock >>> prone. >>> >>> Why is there a need to wait for I/O completion in the first place >>> if >>> the user has killed the task that held the lock? 'kill -9' will >>> cause >>> corruption; that's a fact that no amount of paper will cover over. >> >> To avoid an unnecessary recovery situation where the server asks us >> to resend >> I/O due to an invalid stateid. >> > > I agree we shouldn't recover in this situation. It would be better to > jettison the failed write, and invalidate the page. Can we make use of > nfs_wb_page_cancel() together with generic_error_remove_page()? Probably we can piggy-back on NFS_LOCK_LOST, then -EIO would get passed up and the page would make it into generic_error_remove_page(). Any outstanding writes are likely already transmitted or scheduled to be tranmitted by now, and the error recovery path for incorrect stateids doesn't intersect with nfs_wb_page_cancel(), rather it re-schedules the RPC. But, after looking at this further, I'm not sure how much work should be done here. It's a fairly unlikely situation already, and if we assert that a fatal signal means writes don't have to complete at all, I don't see the harm in having them complete outside the lock. Adding extra complexity to bypass recovery for this specific situation would be optimal, but unnecessary. Ben ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-02-21 13:56 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-02-17 18:37 [PATCH 0/4] Skipped unlocks Benjamin Coddington 2017-02-17 18:37 ` [PATCH 1/4] NFS4: remove a redundant lock range check Benjamin Coddington 2017-02-17 18:37 ` [PATCH 2/4] NFS: Move the flock open mode check into nfs_flock() Benjamin Coddington 2017-02-17 18:37 ` [PATCH 3/4] locks: Set FL_CLOSE when removing flock locks on close() Benjamin Coddington 2017-02-17 18:37 ` [PATCH 4/4] NFS: Always wait for I/O completion before unlock Benjamin Coddington 2017-02-17 18:46 [PATCH v2 0/4] Skipped unlocks Benjamin Coddington 2017-02-17 18:46 ` [PATCH 4/4] NFS: Always wait for I/O completion before unlock Benjamin Coddington 2017-02-17 19:00 ` Trond Myklebust 2017-02-17 19:15 ` Benjamin Coddington 2017-02-17 19:30 ` Trond Myklebust 2017-02-17 20:10 ` Benjamin Coddington 2017-02-21 13:56 ` Benjamin Coddington
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.