linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* nfs: possible sync issue between nfs_call_unlink <-> nfs_async_unlink_release
@ 2018-07-03 21:04 Rafael Tinoco
  2018-11-14  0:28 ` Rafael David Tinoco
  0 siblings, 1 reply; 2+ messages in thread
From: Rafael Tinoco @ 2018-07-03 21:04 UTC (permalink / raw)
  To: Al Viro
  Cc: Trond Myklebust, Anna Schumaker, linux-nfs, Linux Kernel, lkft-triage

BUG: https://bugs.linaro.org/show_bug.cgi?id=3731

During Linaro's Kernel Functional tests, we have observed the
following situation:

[   52.651490] DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)1UL))
[   52.651506] WARNING: CPU: 2 PID: 1457 at
./kernel/locking/rwsem.c:217 up_read_non_owner+0x5d/0x70
[   52.674398] Modules linked in: x86_pkg_temp_thermal fuse
[   52.679719] CPU: 2 PID: 1457 Comm: kworker/2:2 Not tainted 4.16.0 #1
[   52.687448] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[   52.694922] Workqueue: nfsiod rpc_async_release
[   52.699454] RIP: 0010:up_read_non_owner+0x5d/0x70
[   52.704157] RSP: 0018:ffff9cbf81a23dd0 EFLAGS: 00010282
[   52.709376] RAX: 0000000000000000 RBX: ffff8dc1983c76c0 RCX: 0000000000000000
[   52.716500] RDX: ffffffffbd2d26c9 RSI: 0000000000000001 RDI: ffffffffbd2d2889
[   52.723652] RBP: ffff9cbf81a23dd8 R08: 0000000000000000 R09: 0000000000000000
[   52.730782] R10: ffff9cbf81a23dd0 R11: 0000000000000000 R12: ffff8dc19abf8600
[   52.737906] R13: ffff8dc19b6c0000 R14: 0000000000000000 R15: ffff8dc19bacad80
[   52.745029] FS:  0000000000000000(0000) GS:ffff8dc1afd00000(0000)
knlGS:0000000000000000
[   52.753108] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   52.758845] CR2: 00007f33794665d8 CR3: 000000016c41e006 CR4: 00000000003606e0
[   52.765968] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   52.773091] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   52.780215] Call Trace:
[   52.782695]  nfs_async_unlink_release+0x32/0x80
[   52.787220]  rpc_free_task+0x30/0x50
[   52.790789]  rpc_async_release+0x12/0x20
[   52.794707]  process_one_work+0x25e/0x660
[   52.798713]  worker_thread+0x4b/0x410
[   52.802377]  kthread+0x10d/0x140
[   52.805600]  ? rescuer_thread+0x3a0/0x3a0
[   52.809652]  ? kthread_create_worker_on_cpu+0x70/0x70
[   52.814702]  ? do_syscall_64+0x69/0x1b0
[   52.818540]  ret_from_fork+0x3a/0x50

TEST RUNS:

https://lkft.validation.linaro.org/scheduler/job/167146#L2361
https://lkft.validation.linaro.org/scheduler/job/177145#L883

COMMENTS:

This started happening after commit 5149cbac4235 (locking/rwsem: Add
DEBUG_RWSEMS to look for lock/unlock mismatches) introduced checks for
the semaphores.

After some investigations (in Bug: #3731), the feedback, and assumptions, are:

Function "nfs_rmdir()" acquires the write semaphore in order to
guarantee that no unlink operations will happen to inodes inside the
directory to be unlinked, making it safe to unlink a directory on NFS.
The parent's dentry (read) semaphore serve as a synchronization
(mutual exclusive) for rmdir (read/write), to guarantee there are no
files being unlinked inside the directory when this is being unlinked.

The logic "nfs_call_unlink() -> nfs_do_call_unlink()" acquires the
inode's (that was unlinked) parent dentry->rmdir_sem (read) until the
inode's nfs_unlinkdata (needed for silly unlink) is finally
asynchronously (rpc call) cleaned by nfs_async_unlink_release() call,
the same place where inode's parent dentry->rmdir_sem is released, as
originally described in commit: 884be175351e. Purpose of this is to
guarantee that the parent directory won't be unlinked while the inode
is being async unlinked.

Unfortunately it seems that there is some condition that either made
the read semaphore (from inode's parent dentry) to be released (by the
same async logic ?) before nfs_async_unlink_release() ran through RPC
call OR the inode was moved (rmdir_sem doesn't seem to be acquired in
other situations) to a different directory (caching a new dentry
without the read semaphore acquired), making the DEBUG_LOCKS_WARN_ON
to have caught this.

Thank you for reviewing this.
-- 
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: nfs: possible sync issue between nfs_call_unlink <-> nfs_async_unlink_release
  2018-07-03 21:04 nfs: possible sync issue between nfs_call_unlink <-> nfs_async_unlink_release Rafael Tinoco
@ 2018-11-14  0:28 ` Rafael David Tinoco
  0 siblings, 0 replies; 2+ messages in thread
From: Rafael David Tinoco @ 2018-11-14  0:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: rafael.tinoco, Waiman Long

On 7/3/18 6:04 PM, Rafael Tinoco wrote:
> BUG: https://bugs.linaro.org/show_bug.cgi?id=3731
> 
> During Linaro's Kernel Functional tests, we have observed the
> following situation:
> 
> [   52.651490] DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)1UL))
> [   52.651506] WARNING: CPU: 2 PID: 1457 at
> ./kernel/locking/rwsem.c:217 up_read_non_owner+0x5d/0x70
> [   52.674398] Modules linked in: x86_pkg_temp_thermal fuse
> [   52.679719] CPU: 2 PID: 1457 Comm: kworker/2:2 Not tainted 4.16.0 #1
> [   52.687448] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> 2.0b 07/27/2017
> [   52.694922] Workqueue: nfsiod rpc_async_release
> [   52.699454] RIP: 0010:up_read_non_owner+0x5d/0x70
> [   52.704157] RSP: 0018:ffff9cbf81a23dd0 EFLAGS: 00010282
> [   52.709376] RAX: 0000000000000000 RBX: ffff8dc1983c76c0 RCX: 0000000000000000
> [   52.716500] RDX: ffffffffbd2d26c9 RSI: 0000000000000001 RDI: ffffffffbd2d2889
> [   52.723652] RBP: ffff9cbf81a23dd8 R08: 0000000000000000 R09: 0000000000000000
> [   52.730782] R10: ffff9cbf81a23dd0 R11: 0000000000000000 R12: ffff8dc19abf8600
> [   52.737906] R13: ffff8dc19b6c0000 R14: 0000000000000000 R15: ffff8dc19bacad80
> [   52.745029] FS:  0000000000000000(0000) GS:ffff8dc1afd00000(0000)
> knlGS:0000000000000000
> [   52.753108] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   52.758845] CR2: 00007f33794665d8 CR3: 000000016c41e006 CR4: 00000000003606e0
> [   52.765968] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   52.773091] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   52.780215] Call Trace:
> [   52.782695]  nfs_async_unlink_release+0x32/0x80
> [   52.787220]  rpc_free_task+0x30/0x50
> [   52.790789]  rpc_async_release+0x12/0x20
> [   52.794707]  process_one_work+0x25e/0x660
> [   52.798713]  worker_thread+0x4b/0x410
> [   52.802377]  kthread+0x10d/0x140
> [   52.805600]  ? rescuer_thread+0x3a0/0x3a0
> [   52.809652]  ? kthread_create_worker_on_cpu+0x70/0x70
> [   52.814702]  ? do_syscall_64+0x69/0x1b0
> [   52.818540]  ret_from_fork+0x3a/0x50

Full explanation: https://bugs.linaro.org/show_bug.cgi?id=3731#c15

Appears that:

commit 925b9cd1b89a94b7124d128c80dfc48f78a63098
Author: Waiman Long <longman@redhat.com>
Date:   Thu Sep 6 17:18:34 2018

     locking/rwsem: Make owner store task pointer of last owning reader

seem to have fixed this warning:

DEBUG_RWSEMS_WARN_ON() started differentiating unknown owners for 
semaphores, like in this case, when RPC function forces the release the 
dentry's semaphore to avoid issues with NFS filesystem logic (nfs_rmdir, 
the real user for the semaphore in question).

Thanks Waiman!
-
Rafael D. Tinoco
Linaro Kernel Validation

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-11-14  0:29 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-03 21:04 nfs: possible sync issue between nfs_call_unlink <-> nfs_async_unlink_release Rafael Tinoco
2018-11-14  0:28 ` Rafael David Tinoco

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).