* Files leak from nfsd in 4.7.1-rc1 (and more?) @ 2016-06-07 15:37 Oleg Drokin 2016-06-07 17:10 ` Jeff Layton 0 siblings, 1 reply; 45+ messages in thread From: Oleg Drokin @ 2016-06-07 15:37 UTC (permalink / raw) To: J. Bruce Fields, Jeff Layton Cc: linux-nfs, <linux-kernel@vger.kernel.org> Mailing List Hello! I've been trying to better understand this problem I was having where sometimes a formerly NFS-exported mountpoint becomes unmountable (after nfsd stop). I finally traced it to a leaked filedescriptor that was allocated from nfsd4_open()->nfsd4_process_open2()->nfs4_get_vfs_file()->nfsd_open(). Also together with it we see leaked credentials allocated along the same path from fh_verify() and groups allocated from svcauth_unix_accept()->groups_alloc() that are presumably used by the credentials. Unfortunately I was not able to make total sense out of the state handling in nfsd, but it's clear that one of the file descriptors inside struct nfs4_file is lost. I added a patch like this (always a good idea, so surprised it was not there already): @@ -271,6 +274,9 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu) { struct nfs4_file *fp = container_of(rcu, struct nfs4_file, fi_rcu); + WARN_ON(fp->fi_fds[0]); + WARN_ON(fp->fi_fds[1]); + WARN_ON(fp->fi_fds[2]); kmem_cache_free(file_slab, fp); } And when the problem is hit, I am also triggering (Always this one which is fd[1]) [ 3588.143002] ------------[ cut here ]------------ [ 3588.143662] WARNING: CPU: 5 PID: 9 at /home/green/bk/linux/fs/nfsd/nfs4state.c:278 nfsd4_free_file_rcu+0x65/0x80 [nfsd] [ 3588.144947] Modules linked in: loop rpcsec_gss_krb5 joydev acpi_cpufreq tpm_tis i2c_piix4 tpm virtio_console pcspkr nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm floppy serio_raw virtio_blk [ 3588.147135] CPU: 5 PID: 9 Comm: rcuos/0 Not tainted 4.7.0-rc1-vm-nfs+ #120 [ 3588.153826] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 3588.153830] 0000000000000286 00000000e2d5ccdf ffff88011965bd50 ffffffff814a11a5 [ 3588.153832] 0000000000000000 0000000000000000 ffff88011965bd90 ffffffff8108806b [ 3588.153834] 0000011600000000 ffff8800c476a0b8 ffff8800c476a048 ffffffffc0110fc0 [ 3588.153834] Call Trace: [ 3588.153839] [<ffffffff814a11a5>] dump_stack+0x86/0xc1 [ 3588.153841] [<ffffffff8108806b>] __warn+0xcb/0xf0 [ 3588.153852] [<ffffffffc0110fc0>] ? trace_raw_output_fh_want_write+0x60/0x60 [nfsd] [ 3588.153853] [<ffffffff8108819d>] warn_slowpath_null+0x1d/0x20 [ 3588.153859] [<ffffffffc0111025>] nfsd4_free_file_rcu+0x65/0x80 [nfsd] [ 3588.153861] [<ffffffff81109c65>] rcu_nocb_kthread+0x335/0x510 [ 3588.153862] [<ffffffff81109baf>] ? rcu_nocb_kthread+0x27f/0x510 [ 3588.153863] [<ffffffff81109930>] ? rcu_cpu_notify+0x3e0/0x3e0 [ 3588.153866] [<ffffffff810af391>] kthread+0x101/0x120 [ 3588.153868] [<ffffffff810e6c84>] ? trace_hardirqs_on_caller+0xf4/0x1b0 [ 3588.153871] [<ffffffff8188b6af>] ret_from_fork+0x1f/0x40 [ 3588.153872] [<ffffffff810af290>] ? kthread_create_on_node+0x250/0x250 release_all_access() seems to be doing correct job of all that cleaning, so there must be some other path that I do not quite see. Hopefully you are more familiar with the code and can see the problem right away ;) Bye, Oleg ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Files leak from nfsd in 4.7.1-rc1 (and more?) 2016-06-07 15:37 Files leak from nfsd in 4.7.1-rc1 (and more?) Oleg Drokin @ 2016-06-07 17:10 ` Jeff Layton 2016-06-07 17:30 ` Oleg Drokin 0 siblings, 1 reply; 45+ messages in thread From: Jeff Layton @ 2016-06-07 17:10 UTC (permalink / raw) To: Oleg Drokin, J. Bruce Fields Cc: linux-nfs, <linux-kernel@vger.kernel.org> Mailing List On Tue, 2016-06-07 at 11:37 -0400, Oleg Drokin wrote: > Hello! > > I've been trying to better understand this problem I was having where sometimes > a formerly NFS-exported mountpoint becomes unmountable (after nfsd stop). > > I finally traced it to a leaked filedescriptor that was allocated from > nfsd4_open()->nfsd4_process_open2()->nfs4_get_vfs_file()->nfsd_open(). > > Also together with it we see leaked credentials allocated along the same path from > fh_verify() and groups allocated from svcauth_unix_accept()->groups_alloc() that > are presumably used by the credentials. > > Unfortunately I was not able to make total sense out of the state handling in nfsd, > but it's clear that one of the file descriptors inside struct nfs4_file is > lost. I added a patch like this (always a good idea, so surprised it was not > there already): > @@ -271,6 +274,9 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu) > { > struct nfs4_file *fp = container_of(rcu, struct nfs4_file, fi_rcu); > > + WARN_ON(fp->fi_fds[0]); > + WARN_ON(fp->fi_fds[1]); > + WARN_ON(fp->fi_fds[2]); > kmem_cache_free(file_slab, fp); > } > > And when the problem is hit, I am also triggering (Always this one which is fd[1]) > [ 3588.143002] ------------[ cut here ]------------ > [ 3588.143662] WARNING: CPU: 5 PID: 9 at /home/green/bk/linux/fs/nfsd/nfs4state.c:278 nfsd4_free_file_rcu+0x65/0x80 [nfsd] > [ 3588.144947] Modules linked in: loop rpcsec_gss_krb5 joydev acpi_cpufreq tpm_tis i2c_piix4 tpm virtio_console pcspkr nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm floppy serio_raw virtio_blk > [ 3588.147135] CPU: 5 PID: 9 Comm: rcuos/0 Not tainted 4.7.0-rc1-vm-nfs+ #120 > [ 3588.153826] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [ 3588.153830] 0000000000000286 00000000e2d5ccdf ffff88011965bd50 ffffffff814a11a5 > [ 3588.153832] 0000000000000000 0000000000000000 ffff88011965bd90 ffffffff8108806b > [ 3588.153834] 0000011600000000 ffff8800c476a0b8 ffff8800c476a048 ffffffffc0110fc0 > [ 3588.153834] Call Trace: > [ 3588.153839] [] dump_stack+0x86/0xc1 > [ 3588.153841] [] __warn+0xcb/0xf0 > [ 3588.153852] [] ? trace_raw_output_fh_want_write+0x60/0x60 [nfsd] > [ 3588.153853] [] warn_slowpath_null+0x1d/0x20 > [ 3588.153859] [] nfsd4_free_file_rcu+0x65/0x80 [nfsd] > [ 3588.153861] [] rcu_nocb_kthread+0x335/0x510 > [ 3588.153862] [] ? rcu_nocb_kthread+0x27f/0x510 > [ 3588.153863] [] ? rcu_cpu_notify+0x3e0/0x3e0 > [ 3588.153866] [] kthread+0x101/0x120 > [ 3588.153868] [] ? trace_hardirqs_on_caller+0xf4/0x1b0 > [ 3588.153871] [] ret_from_fork+0x1f/0x40 > [ 3588.153872] [] ? kthread_create_on_node+0x250/0x250 > > > release_all_access() seems to be doing correct job of all that cleaning, so > there must be some other path that I do not quite see. > > Hopefully you are more familiar with the code and can see the problem right away ;) > > Bye, > Oleg Hmm...well I'm not seeing it right offhand, and haven't been able to reproduce the problem so far after a couple of attempts by hand. What sort of workload are you running before you see that warning pop? -- Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Files leak from nfsd in 4.7.1-rc1 (and more?) 2016-06-07 17:10 ` Jeff Layton @ 2016-06-07 17:30 ` Oleg Drokin 2016-06-07 20:04 ` Jeff Layton 0 siblings, 1 reply; 45+ messages in thread From: Oleg Drokin @ 2016-06-07 17:30 UTC (permalink / raw) To: Jeff Layton Cc: J. Bruce Fields, linux-nfs, <linux-kernel@vger.kernel.org> Mailing List On Jun 7, 2016, at 1:10 PM, Jeff Layton wrote: > On Tue, 2016-06-07 at 11:37 -0400, Oleg Drokin wrote: >> Hello! >> >> I've been trying to better understand this problem I was having where sometimes >> a formerly NFS-exported mountpoint becomes unmountable (after nfsd stop). >> >> I finally traced it to a leaked filedescriptor that was allocated from >> nfsd4_open()->nfsd4_process_open2()->nfs4_get_vfs_file()->nfsd_open(). >> >> Also together with it we see leaked credentials allocated along the same path from >> fh_verify() and groups allocated from svcauth_unix_accept()->groups_alloc() that >> are presumably used by the credentials. >> >> Unfortunately I was not able to make total sense out of the state handling in nfsd, >> but it's clear that one of the file descriptors inside struct nfs4_file is >> lost. I added a patch like this (always a good idea, so surprised it was not >> there already): >> @@ -271,6 +274,9 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu) >> { >> struct nfs4_file *fp = container_of(rcu, struct nfs4_file, fi_rcu); >> >> + WARN_ON(fp->fi_fds[0]); >> + WARN_ON(fp->fi_fds[1]); >> + WARN_ON(fp->fi_fds[2]); >> kmem_cache_free(file_slab, fp); >> } >> >> And when the problem is hit, I am also triggering (Always this one which is fd[1]) >> [ 3588.143002] ------------[ cut here ]------------ >> [ 3588.143662] WARNING: CPU: 5 PID: 9 at /home/green/bk/linux/fs/nfsd/nfs4state.c:278 nfsd4_free_file_rcu+0x65/0x80 [nfsd] >> [ 3588.144947] Modules linked in: loop rpcsec_gss_krb5 joydev acpi_cpufreq tpm_tis i2c_piix4 tpm virtio_console pcspkr nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm floppy serio_raw virtio_blk >> [ 3588.147135] CPU: 5 PID: 9 Comm: rcuos/0 Not tainted 4.7.0-rc1-vm-nfs+ #120 >> [ 3588.153826] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 >> [ 3588.153830] 0000000000000286 00000000e2d5ccdf ffff88011965bd50 ffffffff814a11a5 >> [ 3588.153832] 0000000000000000 0000000000000000 ffff88011965bd90 ffffffff8108806b >> [ 3588.153834] 0000011600000000 ffff8800c476a0b8 ffff8800c476a048 ffffffffc0110fc0 >> [ 3588.153834] Call Trace: >> [ 3588.153839] [] dump_stack+0x86/0xc1 >> [ 3588.153841] [] __warn+0xcb/0xf0 >> [ 3588.153852] [] ? trace_raw_output_fh_want_write+0x60/0x60 [nfsd] >> [ 3588.153853] [] warn_slowpath_null+0x1d/0x20 >> [ 3588.153859] [] nfsd4_free_file_rcu+0x65/0x80 [nfsd] >> [ 3588.153861] [] rcu_nocb_kthread+0x335/0x510 >> [ 3588.153862] [] ? rcu_nocb_kthread+0x27f/0x510 >> [ 3588.153863] [] ? rcu_cpu_notify+0x3e0/0x3e0 >> [ 3588.153866] [] kthread+0x101/0x120 >> [ 3588.153868] [] ? trace_hardirqs_on_caller+0xf4/0x1b0 >> [ 3588.153871] [] ret_from_fork+0x1f/0x40 >> [ 3588.153872] [] ? kthread_create_on_node+0x250/0x250 >> >> >> release_all_access() seems to be doing correct job of all that cleaning, so >> there must be some other path that I do not quite see. >> >> Hopefully you are more familiar with the code and can see the problem right away ;) > > Hmm...well I'm not seeing it right offhand, and haven't been able to > reproduce the problem so far after a couple of attempts by hand. What > sort of workload are you running before you see that warning pop? Well, the workload is a crazed twisty conflicting operations set in a very limited namespace (called racer.sh, http://git.whamcloud.com/fs/lustrerelease.git/tree/refs/heads/master:/lustre/tests/racer ) It does all sorts of crazy unimaginable stuff that is hard to predict, and typically I run it on several mountpoints all pointing to the same spot (now 3-ways: one instance on direct mount point in ext4, one on nfs mount mounted as mount -o nfs and one on mount -o nfs4). It must be hitting some rare race or a race + operations combination that leads to the leak. To reproduce I run it like this until it stops (and it usually stops in 2 or 3 iterations, so it takes some time): $ cat ~/stress.sh #!/bin/bash set -x cd /home/green/git/lustre-release/lustre/tests/racer dd if=/dev/zero of=/tmp/loop bs=1024k count=1024 mkfs.ext4 /tmp/loop service rpcbind start mount none /var/lib/nfs -t tmpfs touch /var/lib/nfs/etab service nfs-mountd start while :; do mount /tmp/loop /mnt/lustre -o loop || exit 1 mkdir /mnt/lustre/racer service nfs-server start || exit 2 mount localhost:/mnt/lustre /mnt/nfs -t nfs -o nolock || exit 3 mount localhost:/ /mnt/nfs2 -t nfs4 || exit 4 DURATION=3600 sh racer.sh /mnt/nfs/racer & DURATION=3600 sh racer.sh /mnt/nfs2/racer & DURATION=3600 sh racer.sh /mnt/lustre/racer & wait %1 %2 %3 umount /mnt/nfs || exit 5 umount /mnt/nfs2 || exit 6 service nfs-server stop || exit 7 umount /mnt/lustre || exit 8 e2fsck -n -f /tmp/loop || exit 9 done Please ignore all the Lustre references, the test is generic, it's just I am running off nfs-readonly-root setup, so want to reuse existing mountpoint locations, there's no Lustre in picture here). With all the other problems from before fixed, now it typically stops either for this leak discussed here, or for this other problem in nfs4 I described here: http://marc.info/?l=linux-nfs&m=146518324823160&w=2 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Files leak from nfsd in 4.7.1-rc1 (and more?) 2016-06-07 17:30 ` Oleg Drokin @ 2016-06-07 20:04 ` Jeff Layton 2016-06-07 23:39 ` Oleg Drokin 0 siblings, 1 reply; 45+ messages in thread From: Jeff Layton @ 2016-06-07 20:04 UTC (permalink / raw) To: Oleg Drokin Cc: J. Bruce Fields, linux-nfs, <linux-kernel@vger.kernel.org> Mailing List On Tue, 2016-06-07 at 13:30 -0400, Oleg Drokin wrote: > On Jun 7, 2016, at 1:10 PM, Jeff Layton wrote: > > > > > On Tue, 2016-06-07 at 11:37 -0400, Oleg Drokin wrote: > > > > > > Hello! > > > > > > I've been trying to better understand this problem I was having where sometimes > > > a formerly NFS-exported mountpoint becomes unmountable (after nfsd stop). > > > > > > I finally traced it to a leaked filedescriptor that was allocated from > > > nfsd4_open()->nfsd4_process_open2()->nfs4_get_vfs_file()->nfsd_open(). > > > > > > Also together with it we see leaked credentials allocated along the same path from > > > fh_verify() and groups allocated from svcauth_unix_accept()->groups_alloc() that > > > are presumably used by the credentials. > > > > > > Unfortunately I was not able to make total sense out of the state handling in nfsd, > > > but it's clear that one of the file descriptors inside struct nfs4_file is > > > lost. I added a patch like this (always a good idea, so surprised it was not > > > there already): > > > @@ -271,6 +274,9 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu) > > > { > > > struct nfs4_file *fp = container_of(rcu, struct nfs4_file, fi_rcu); > > > > > > + WARN_ON(fp->fi_fds[0]); > > > + WARN_ON(fp->fi_fds[1]); > > > + WARN_ON(fp->fi_fds[2]); > > > kmem_cache_free(file_slab, fp); > > > } > > > > > > And when the problem is hit, I am also triggering (Always this one which is fd[1]) > > > [ 3588.143002] ------------[ cut here ]------------ > > > [ 3588.143662] WARNING: CPU: 5 PID: 9 at /home/green/bk/linux/fs/nfsd/nfs4state.c:278 nfsd4_free_file_rcu+0x65/0x80 [nfsd] > > > [ 3588.144947] Modules linked in: loop rpcsec_gss_krb5 joydev acpi_cpufreq tpm_tis i2c_piix4 tpm virtio_console pcspkr nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm floppy serio_raw virtio_blk > > > [ 3588.147135] CPU: 5 PID: 9 Comm: rcuos/0 Not tainted 4.7.0-rc1-vm-nfs+ #120 > > > [ 3588.153826] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > > > [ 3588.153830] 0000000000000286 00000000e2d5ccdf ffff88011965bd50 ffffffff814a11a5 > > > [ 3588.153832] 0000000000000000 0000000000000000 ffff88011965bd90 ffffffff8108806b > > > [ 3588.153834] 0000011600000000 ffff8800c476a0b8 ffff8800c476a048 ffffffffc0110fc0 > > > [ 3588.153834] Call Trace: > > > [ 3588.153839] [] dump_stack+0x86/0xc1 > > > [ 3588.153841] [] __warn+0xcb/0xf0 > > > [ 3588.153852] [] ? trace_raw_output_fh_want_write+0x60/0x60 [nfsd] > > > [ 3588.153853] [] warn_slowpath_null+0x1d/0x20 > > > [ 3588.153859] [] nfsd4_free_file_rcu+0x65/0x80 [nfsd] > > > [ 3588.153861] [] rcu_nocb_kthread+0x335/0x510 > > > [ 3588.153862] [] ? rcu_nocb_kthread+0x27f/0x510 > > > [ 3588.153863] [] ? rcu_cpu_notify+0x3e0/0x3e0 > > > [ 3588.153866] [] kthread+0x101/0x120 > > > [ 3588.153868] [] ? trace_hardirqs_on_caller+0xf4/0x1b0 > > > [ 3588.153871] [] ret_from_fork+0x1f/0x40 > > > [ 3588.153872] [] ? kthread_create_on_node+0x250/0x250 > > > > > > > > > release_all_access() seems to be doing correct job of all that cleaning, so > > > there must be some other path that I do not quite see. > > > > > > Hopefully you are more familiar with the code and can see the problem right away ;) > > Hmm...well I'm not seeing it right offhand, and haven't been able to > > reproduce the problem so far after a couple of attempts by hand. What > > sort of workload are you running before you see that warning pop? > Well, the workload is a crazed twisty conflicting operations set in a very limited > namespace (called racer.sh, > http://git.whamcloud.com/fs/lustrerelease.git/tree/refs/heads/master:/lustre/tests/racer ) > > It does all sorts of crazy unimaginable stuff that is hard to predict, and typically I > run it on several mountpoints all pointing to the same spot > (now 3-ways: one instance on direct mount point in ext4, one on nfs mount mounted as > mount -o nfs and one on mount -o nfs4). > It must be hitting some rare race or a race + operations combination that leads to > the leak. > > To reproduce I run it like this until it stops (and it usually stops in > 2 or 3 iterations, so it takes some time): > $ cat ~/stress.sh > #!/bin/bash > > set -x > > cd /home/green/git/lustre-release/lustre/tests/racer > dd if=/dev/zero of=/tmp/loop bs=1024k count=1024 > mkfs.ext4 /tmp/loop > service rpcbind start > mount none /var/lib/nfs -t tmpfs > touch /var/lib/nfs/etab > service nfs-mountd start > > while :; do > > mount /tmp/loop /mnt/lustre -o loop || exit 1 > mkdir /mnt/lustre/racer > service nfs-server start || exit 2 > mount localhost:/mnt/lustre /mnt/nfs -t nfs -o nolock || exit 3 > mount localhost:/ /mnt/nfs2 -t nfs4 || exit 4 > DURATION=3600 sh racer.sh /mnt/nfs/racer & > DURATION=3600 sh racer.sh /mnt/nfs2/racer & > DURATION=3600 sh racer.sh /mnt/lustre/racer & > wait %1 %2 %3 > umount /mnt/nfs || exit 5 > umount /mnt/nfs2 || exit 6 > service nfs-server stop || exit 7 > umount /mnt/lustre || exit 8 > e2fsck -n -f /tmp/loop || exit 9 > done > > Please ignore all the Lustre references, the test is generic, it's just I am running > off nfs-readonly-root setup, so want to reuse existing mountpoint locations, > there's no Lustre in picture here). > With all the other problems from before fixed, now it typically stops either for > this leak discussed here, or for this other problem in nfs4 I described here: > http://marc.info/?l=linux-nfs&m=146518324823160&w=2 Thanks. So it seems likely we're looking at a race of some sort then... So, the way this is supposed to work is that the stateids each hold a reference to the nfs4_file. They also take an fi_access reference for the open modes that they represent. This is how we know when it's OK to close one of the files in the fd[] slots. It might be interesting to also print out the fi_access counters if the slot hasn't been zeroed out if you feel up to it. I think though that the open upgrade accounting may be wrong. __nfs4_file_get_access does an unconditional atomic_inc, so if the client does a open for write and then opens again for read/write, then we may end up with too many write fi_access references taken. That said, this code is quite subtle. I'd need to look over it in more detail before I offer up any fixes. I'd also appreciate it if anyone else wants to sanity check my analysis there. -- Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Files leak from nfsd in 4.7.1-rc1 (and more?) 2016-06-07 20:04 ` Jeff Layton @ 2016-06-07 23:39 ` Oleg Drokin 2016-06-08 0:03 ` Jeff Layton 0 siblings, 1 reply; 45+ messages in thread From: Oleg Drokin @ 2016-06-07 23:39 UTC (permalink / raw) To: Jeff Layton Cc: J. Bruce Fields, linux-nfs, <linux-kernel@vger.kernel.org> Mailing List On Jun 7, 2016, at 4:04 PM, Jeff Layton wrote: > On Tue, 2016-06-07 at 13:30 -0400, Oleg Drokin wrote: >> On Jun 7, 2016, at 1:10 PM, Jeff Layton wrote: >> >>> >>> On Tue, 2016-06-07 at 11:37 -0400, Oleg Drokin wrote: >>>> >>>> Hello! >>>> >>>> I've been trying to better understand this problem I was having where sometimes >>>> a formerly NFS-exported mountpoint becomes unmountable (after nfsd stop). >>>> >>>> I finally traced it to a leaked filedescriptor that was allocated from >>>> nfsd4_open()->nfsd4_process_open2()->nfs4_get_vfs_file()->nfsd_open(). >>>> >>>> Also together with it we see leaked credentials allocated along the same path from >>>> fh_verify() and groups allocated from svcauth_unix_accept()->groups_alloc() that >>>> are presumably used by the credentials. >>>> >>>> Unfortunately I was not able to make total sense out of the state handling in nfsd, >>>> but it's clear that one of the file descriptors inside struct nfs4_file is >>>> lost. I added a patch like this (always a good idea, so surprised it was not >>>> there already): >>>> @@ -271,6 +274,9 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu) >>>> { >>>> struct nfs4_file *fp = container_of(rcu, struct nfs4_file, fi_rcu); >>>> >>>> + WARN_ON(fp->fi_fds[0]); >>>> + WARN_ON(fp->fi_fds[1]); >>>> + WARN_ON(fp->fi_fds[2]); >>>> kmem_cache_free(file_slab, fp); >>>> } >>>> >>>> And when the problem is hit, I am also triggering (Always this one which is fd[1]) >>>> [ 3588.143002] ------------[ cut here ]------------ >>>> [ 3588.143662] WARNING: CPU: 5 PID: 9 at /home/green/bk/linux/fs/nfsd/nfs4state.c:278 nfsd4_free_file_rcu+0x65/0x80 [nfsd] >>>> [ 3588.144947] Modules linked in: loop rpcsec_gss_krb5 joydev acpi_cpufreq tpm_tis i2c_piix4 tpm virtio_console pcspkr nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm floppy serio_raw virtio_blk >>>> [ 3588.147135] CPU: 5 PID: 9 Comm: rcuos/0 Not tainted 4.7.0-rc1-vm-nfs+ #120 >>>> [ 3588.153826] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 >>>> [ 3588.153830] 0000000000000286 00000000e2d5ccdf ffff88011965bd50 ffffffff814a11a5 >>>> [ 3588.153832] 0000000000000000 0000000000000000 ffff88011965bd90 ffffffff8108806b >>>> [ 3588.153834] 0000011600000000 ffff8800c476a0b8 ffff8800c476a048 ffffffffc0110fc0 >>>> [ 3588.153834] Call Trace: >>>> [ 3588.153839] [] dump_stack+0x86/0xc1 >>>> [ 3588.153841] [] __warn+0xcb/0xf0 >>>> [ 3588.153852] [] ? trace_raw_output_fh_want_write+0x60/0x60 [nfsd] >>>> [ 3588.153853] [] warn_slowpath_null+0x1d/0x20 >>>> [ 3588.153859] [] nfsd4_free_file_rcu+0x65/0x80 [nfsd] >>>> [ 3588.153861] [] rcu_nocb_kthread+0x335/0x510 >>>> [ 3588.153862] [] ? rcu_nocb_kthread+0x27f/0x510 >>>> [ 3588.153863] [] ? rcu_cpu_notify+0x3e0/0x3e0 >>>> [ 3588.153866] [] kthread+0x101/0x120 >>>> [ 3588.153868] [] ? trace_hardirqs_on_caller+0xf4/0x1b0 >>>> [ 3588.153871] [] ret_from_fork+0x1f/0x40 >>>> [ 3588.153872] [] ? kthread_create_on_node+0x250/0x250 >>>> >>>> >>>> release_all_access() seems to be doing correct job of all that cleaning, so >>>> there must be some other path that I do not quite see. >>>> >>>> Hopefully you are more familiar with the code and can see the problem right away ;) >>> Hmm...well I'm not seeing it right offhand, and haven't been able to >>> reproduce the problem so far after a couple of attempts by hand. What >>> sort of workload are you running before you see that warning pop? >> Well, the workload is a crazed twisty conflicting operations set in a very limited >> namespace (called racer.sh, >> http://git.whamcloud.com/fs/lustrerelease.git/tree/refs/heads/master:/lustre/tests/racer ) >> >> It does all sorts of crazy unimaginable stuff that is hard to predict, and typically I >> run it on several mountpoints all pointing to the same spot >> (now 3-ways: one instance on direct mount point in ext4, one on nfs mount mounted as >> mount -o nfs and one on mount -o nfs4). >> It must be hitting some rare race or a race + operations combination that leads to >> the leak. >> >> To reproduce I run it like this until it stops (and it usually stops in >> 2 or 3 iterations, so it takes some time): >> $ cat ~/stress.sh >> #!/bin/bash >> >> set -x >> >> cd /home/green/git/lustre-release/lustre/tests/racer >> dd if=/dev/zero of=/tmp/loop bs=1024k count=1024 >> mkfs.ext4 /tmp/loop >> service rpcbind start >> mount none /var/lib/nfs -t tmpfs >> touch /var/lib/nfs/etab >> service nfs-mountd start >> >> while :; do >> >> mount /tmp/loop /mnt/lustre -o loop || exit 1 >> mkdir /mnt/lustre/racer >> service nfs-server start || exit 2 >> mount localhost:/mnt/lustre /mnt/nfs -t nfs -o nolock || exit 3 >> mount localhost:/ /mnt/nfs2 -t nfs4 || exit 4 >> DURATION=3600 sh racer.sh /mnt/nfs/racer & >> DURATION=3600 sh racer.sh /mnt/nfs2/racer & >> DURATION=3600 sh racer.sh /mnt/lustre/racer & >> wait %1 %2 %3 >> umount /mnt/nfs || exit 5 >> umount /mnt/nfs2 || exit 6 >> service nfs-server stop || exit 7 >> umount /mnt/lustre || exit 8 >> e2fsck -n -f /tmp/loop || exit 9 >> done >> >> Please ignore all the Lustre references, the test is generic, it's just I am running >> off nfs-readonly-root setup, so want to reuse existing mountpoint locations, >> there's no Lustre in picture here). >> With all the other problems from before fixed, now it typically stops either for >> this leak discussed here, or for this other problem in nfs4 I described here: >> http://marc.info/?l=linux-nfs&m=146518324823160&w=2 > > Thanks. So it seems likely we're looking at a race of some sort then... > > So, the way this is supposed to work is that the stateids each hold a > reference to the nfs4_file. They also take an fi_access reference for > the open modes that they represent. This is how we know when it's OK to > close one of the files in the fd[] slots. It might be interesting to > also print out the fi_access counters if the slot hasn't been zeroed > out if you feel up to it. Ok, I have primed that and will let you know next time it triggers. > I think though that the open upgrade accounting may be wrong. > __nfs4_file_get_access does an unconditional atomic_inc, so if the > client does a open for write and then opens again for read/write, then > we may end up with too many write fi_access references taken. The read-write access is all fine I think, because when dropping read or write it also checks if the readwrite is filled? Also whoever did that atomic_inc for get_access would eventually do put, and that should free the file, no? I was mostly thinking there's an error exit path somewhere that forgets to do the put. This also would explain the other pieces leaked that are not directly attached into the nfs4_file. > That said, this code is quite subtle. I'd need to look over it in more > detail before I offer up any fixes. I'd also appreciate it if anyone > else wants to sanity check my analysis there. > > -- > Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Files leak from nfsd in 4.7.1-rc1 (and more?) 2016-06-07 23:39 ` Oleg Drokin @ 2016-06-08 0:03 ` Jeff Layton 2016-06-08 0:46 ` Oleg Drokin 2016-06-08 2:22 ` Oleg Drokin 0 siblings, 2 replies; 45+ messages in thread From: Jeff Layton @ 2016-06-08 0:03 UTC (permalink / raw) To: Oleg Drokin Cc: J. Bruce Fields, linux-nfs, <linux-kernel@vger.kernel.org> Mailing List On Tue, 2016-06-07 at 19:39 -0400, Oleg Drokin wrote: > On Jun 7, 2016, at 4:04 PM, Jeff Layton wrote: > > > > > On Tue, 2016-06-07 at 13:30 -0400, Oleg Drokin wrote: > > > > > > On Jun 7, 2016, at 1:10 PM, Jeff Layton wrote: > > > > > > > > > > > > > > > On Tue, 2016-06-07 at 11:37 -0400, Oleg Drokin wrote: > > > > > > > > > > > > > > > Hello! > > > > > > > > > > I've been trying to better understand this problem I was having where sometimes > > > > > a formerly NFS-exported mountpoint becomes unmountable (after nfsd stop). > > > > > > > > > > I finally traced it to a leaked filedescriptor that was allocated from > > > > > nfsd4_open()->nfsd4_process_open2()->nfs4_get_vfs_file()->nfsd_open(). > > > > > > > > > > Also together with it we see leaked credentials allocated along the same path from > > > > > fh_verify() and groups allocated from svcauth_unix_accept()->groups_alloc() that > > > > > are presumably used by the credentials. > > > > > > > > > > Unfortunately I was not able to make total sense out of the state handling in nfsd, > > > > > but it's clear that one of the file descriptors inside struct nfs4_file is > > > > > lost. I added a patch like this (always a good idea, so surprised it was not > > > > > there already): > > > > > @@ -271,6 +274,9 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu) > > > > > { > > > > > struct nfs4_file *fp = container_of(rcu, struct nfs4_file, fi_rcu); > > > > > > > > > > + WARN_ON(fp->fi_fds[0]); > > > > > + WARN_ON(fp->fi_fds[1]); > > > > > + WARN_ON(fp->fi_fds[2]); > > > > > kmem_cache_free(file_slab, fp); > > > > > } > > > > > > > > > > And when the problem is hit, I am also triggering (Always this one which is fd[1]) > > > > > [ 3588.143002] ------------[ cut here ]------------ > > > > > [ 3588.143662] WARNING: CPU: 5 PID: 9 at /home/green/bk/linux/fs/nfsd/nfs4state.c:278 nfsd4_free_file_rcu+0x65/0x80 [nfsd] > > > > > [ 3588.144947] Modules linked in: loop rpcsec_gss_krb5 joydev acpi_cpufreq tpm_tis i2c_piix4 tpm virtio_console pcspkr nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm floppy serio_raw virtio_blk > > > > > [ 3588.147135] CPU: 5 PID: 9 Comm: rcuos/0 Not tainted 4.7.0-rc1-vm-nfs+ #120 > > > > > [ 3588.153826] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > > > > > [ 3588.153830] 0000000000000286 00000000e2d5ccdf ffff88011965bd50 ffffffff814a11a5 > > > > > [ 3588.153832] 0000000000000000 0000000000000000 ffff88011965bd90 ffffffff8108806b > > > > > [ 3588.153834] 0000011600000000 ffff8800c476a0b8 ffff8800c476a048 ffffffffc0110fc0 > > > > > [ 3588.153834] Call Trace: > > > > > [ 3588.153839] [] dump_stack+0x86/0xc1 > > > > > [ 3588.153841] [] __warn+0xcb/0xf0 > > > > > [ 3588.153852] [] ? trace_raw_output_fh_want_write+0x60/0x60 [nfsd] > > > > > [ 3588.153853] [] warn_slowpath_null+0x1d/0x20 > > > > > [ 3588.153859] [] nfsd4_free_file_rcu+0x65/0x80 [nfsd] > > > > > [ 3588.153861] [] rcu_nocb_kthread+0x335/0x510 > > > > > [ 3588.153862] [] ? rcu_nocb_kthread+0x27f/0x510 > > > > > [ 3588.153863] [] ? rcu_cpu_notify+0x3e0/0x3e0 > > > > > [ 3588.153866] [] kthread+0x101/0x120 > > > > > [ 3588.153868] [] ? trace_hardirqs_on_caller+0xf4/0x1b0 > > > > > [ 3588.153871] [] ret_from_fork+0x1f/0x40 > > > > > [ 3588.153872] [] ? kthread_create_on_node+0x250/0x250 > > > > > > > > > > > > > > > release_all_access() seems to be doing correct job of all that cleaning, so > > > > > there must be some other path that I do not quite see. > > > > > > > > > > Hopefully you are more familiar with the code and can see the problem right away ;) > > > > Hmm...well I'm not seeing it right offhand, and haven't been able to > > > > reproduce the problem so far after a couple of attempts by hand. What > > > > sort of workload are you running before you see that warning pop? > > > Well, the workload is a crazed twisty conflicting operations set in a very limited > > > namespace (called racer.sh, > > > http://git.whamcloud.com/fs/lustrerelease.git/tree/refs/heads/master:/lustre/tests/racer ) > > > > > > It does all sorts of crazy unimaginable stuff that is hard to predict, and typically I > > > run it on several mountpoints all pointing to the same spot > > > (now 3-ways: one instance on direct mount point in ext4, one on nfs mount mounted as > > > mount -o nfs and one on mount -o nfs4). > > > It must be hitting some rare race or a race + operations combination that leads to > > > the leak. > > > > > > To reproduce I run it like this until it stops (and it usually stops in > > > 2 or 3 iterations, so it takes some time): > > > $ cat ~/stress.sh > > > #!/bin/bash > > > > > > set -x > > > > > > cd /home/green/git/lustre-release/lustre/tests/racer > > > dd if=/dev/zero of=/tmp/loop bs=1024k count=1024 > > > mkfs.ext4 /tmp/loop > > > service rpcbind start > > > mount none /var/lib/nfs -t tmpfs > > > touch /var/lib/nfs/etab > > > service nfs-mountd start > > > > > > while :; do > > > > > > mount /tmp/loop /mnt/lustre -o loop || exit 1 > > > mkdir /mnt/lustre/racer > > > service nfs-server start || exit 2 > > > mount localhost:/mnt/lustre /mnt/nfs -t nfs -o nolock || exit 3 > > > mount localhost:/ /mnt/nfs2 -t nfs4 || exit 4 > > > DURATION=3600 sh racer.sh /mnt/nfs/racer & > > > DURATION=3600 sh racer.sh /mnt/nfs2/racer & > > > DURATION=3600 sh racer.sh /mnt/lustre/racer & > > > wait %1 %2 %3 > > > umount /mnt/nfs || exit 5 > > > umount /mnt/nfs2 || exit 6 > > > service nfs-server stop || exit 7 > > > umount /mnt/lustre || exit 8 > > > e2fsck -n -f /tmp/loop || exit 9 > > > done > > > > > > Please ignore all the Lustre references, the test is generic, it's just I am running > > > off nfs-readonly-root setup, so want to reuse existing mountpoint locations, > > > there's no Lustre in picture here). > > > With all the other problems from before fixed, now it typically stops either for > > > this leak discussed here, or for this other problem in nfs4 I described here: > > > http://marc.info/?l=linux-nfs&m=146518324823160&w=2 > > Thanks. So it seems likely we're looking at a race of some sort then... > > > > So, the way this is supposed to work is that the stateids each hold a > > reference to the nfs4_file. They also take an fi_access reference for > > the open modes that they represent. This is how we know when it's OK to > > close one of the files in the fd[] slots. It might be interesting to > > also print out the fi_access counters if the slot hasn't been zeroed > > out if you feel up to it. > Ok, I have primed that and will let you know next time it triggers. > Thanks. Maybe that will give us some idea of where to look... > > > > I think though that the open upgrade accounting may be wrong. > > __nfs4_file_get_access does an unconditional atomic_inc, so if the > > client does a open for write and then opens again for read/write, then > > we may end up with too many write fi_access references taken. > > The read-write access is all fine I think, because when dropping read or write > it also checks if the readwrite is filled? > Also whoever did that atomic_inc for get_access would eventually do put, > and that should free the file, no? > I was mostly thinking there's an error exit path somewhere that forgets to do > the put. > This also would explain the other pieces leaked that are not directly attached > into the nfs4_file. > > > > > That said, this code is quite subtle. I'd need to look over it in more > > detail before I offer up any fixes. I'd also appreciate it if anyone > > else wants to sanity check my analysis there. > > Yeah, I think you're right. It's fine since r/w opens have a distinct slot, even though the refcounting just tracks the number of read and write references. So yeah, the leak probably is in an error path someplace, or maybe a race someplace. -- Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Files leak from nfsd in 4.7.1-rc1 (and more?) 2016-06-08 0:03 ` Jeff Layton @ 2016-06-08 0:46 ` Oleg Drokin 2016-06-08 2:22 ` Oleg Drokin 1 sibling, 0 replies; 45+ messages in thread From: Oleg Drokin @ 2016-06-08 0:46 UTC (permalink / raw) To: Jeff Layton Cc: J. Bruce Fields, linux-nfs, <linux-kernel@vger.kernel.org> Mailing List On Jun 7, 2016, at 8:03 PM, Jeff Layton wrote: > On Tue, 2016-06-07 at 19:39 -0400, Oleg Drokin wrote: >> On Jun 7, 2016, at 4:04 PM, Jeff Layton wrote: >>> So, the way this is supposed to work is that the stateids each hold a >>> reference to the nfs4_file. They also take an fi_access reference for >>> the open modes that they represent. This is how we know when it's OK to >>> close one of the files in the fd[] slots. It might be interesting to >>> also print out the fi_access counters if the slot hasn't been zeroed >>> out if you feel up to it. >> Ok, I have primed that and will let you know next time it triggers. > Thanks. Maybe that will give us some idea of where to look… Well, the counters are: 0 1 0 at the warning site, which is what makes sense, really. >>> I think though that the open upgrade accounting may be wrong. >>> __nfs4_file_get_access does an unconditional atomic_inc, so if the >>> client does a open for write and then opens again for read/write, then >>> we may end up with too many write fi_access references taken. >> >> The read-write access is all fine I think, because when dropping read or write >> it also checks if the readwrite is filled? >> Also whoever did that atomic_inc for get_access would eventually do put, >> and that should free the file, no? >> I was mostly thinking there's an error exit path somewhere that forgets to do >> the put. >> This also would explain the other pieces leaked that are not directly attached >> into the nfs4_file. >> >>> >>> That said, this code is quite subtle. I'd need to look over it in more >>> detail before I offer up any fixes. I'd also appreciate it if anyone >>> else wants to sanity check my analysis there. >>> > > Yeah, I think you're right. It's fine since r/w opens have a distinct > slot, even though the refcounting just tracks the number of read and > write references. So yeah, the leak probably is in an error path > someplace, or maybe a race someplace. I guess I'll try to put some tracers in the code to try and match refcount takers vs refcount droppers for lack of better ideas. Hopefully that'll lead us to see who got the count that was not dropped. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Files leak from nfsd in 4.7.1-rc1 (and more?) 2016-06-08 0:03 ` Jeff Layton 2016-06-08 0:46 ` Oleg Drokin @ 2016-06-08 2:22 ` Oleg Drokin 2016-06-08 3:55 ` Oleg Drokin 2016-06-08 10:58 ` Jeff Layton 1 sibling, 2 replies; 45+ messages in thread From: Oleg Drokin @ 2016-06-08 2:22 UTC (permalink / raw) To: Jeff Layton Cc: J. Bruce Fields, linux-nfs, <linux-kernel@vger.kernel.org> Mailing List On Jun 7, 2016, at 8:03 PM, Jeff Layton wrote: >>> That said, this code is quite subtle. I'd need to look over it in more >>> detail before I offer up any fixes. I'd also appreciate it if anyone >>> else wants to sanity check my analysis there. >>> > Yeah, I think you're right. It's fine since r/w opens have a distinct > slot, even though the refcounting just tracks the number of read and > write references. So yeah, the leak probably is in an error path > someplace, or maybe a race someplace. So I noticed that set_access is always called locked, but clear_access is not, this does not sound right. So I placed this strategic WARN_ON: @@ -3991,6 +4030,7 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, goto out_put_access; spin_lock(&fp->fi_lock); if (!fp->fi_fds[oflag]) { +WARN_ON(!test_access(open->op_share_access, stp)); fp->fi_fds[oflag] = filp; filp = NULL; This is right in the place where nfsd set the access flag already, discovered that the file is not opened and went on to open it, yet some parallel thread came in and cleared the flag by the time we got the file opened. It did trigger (but there are 30 minutes left till test finish, so I don't know yet if this will correspond to the problem at hand yet, so below is speculation). Now, at the exit from this function, the state will not have access for this file set and the file would be leaked, since the matching call, probably in release_all_access() does: if (test_access(i, stp)) nfs4_file_put_access(stp->st_stid.sc_file, i); clear_access(i, stp); Meaning that the refcount is not dropped due to no access flag set. I imagine we can just set the flag back right after setting fp->fi_fds[oflag] and that particular race would be gone? Would that be ok, or is it just wrong that somebody can come and drop (delegation downgrade seems to be the most likely suspect) access bits like this while they are in the middle of being acquired? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Files leak from nfsd in 4.7.1-rc1 (and more?) 2016-06-08 2:22 ` Oleg Drokin @ 2016-06-08 3:55 ` Oleg Drokin 2016-06-08 10:58 ` Jeff Layton 1 sibling, 0 replies; 45+ messages in thread From: Oleg Drokin @ 2016-06-08 3:55 UTC (permalink / raw) To: Jeff Layton Cc: J. Bruce Fields, linux-nfs, <linux-kernel@vger.kernel.org> Mailing List On Jun 7, 2016, at 10:22 PM, Oleg Drokin wrote: > > On Jun 7, 2016, at 8:03 PM, Jeff Layton wrote: > >>>> That said, this code is quite subtle. I'd need to look over it in more >>>> detail before I offer up any fixes. I'd also appreciate it if anyone >>>> else wants to sanity check my analysis there. >>>> >> Yeah, I think you're right. It's fine since r/w opens have a distinct >> slot, even though the refcounting just tracks the number of read and >> write references. So yeah, the leak probably is in an error path >> someplace, or maybe a race someplace. > > So I noticed that set_access is always called locked, but clear_access is not, > this does not sound right. > > So I placed this strategic WARN_ON: > @@ -3991,6 +4030,7 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, > goto out_put_access; > spin_lock(&fp->fi_lock); > if (!fp->fi_fds[oflag]) { > +WARN_ON(!test_access(open->op_share_access, stp)); > fp->fi_fds[oflag] = filp; > filp = NULL; > > This is right in the place where nfsd set the access flag already, discovered > that the file is not opened and went on to open it, yet some parallel thread > came in and cleared the flag by the time we got the file opened. > It did trigger (but there are 30 minutes left till test finish, so I don't > know yet if this will correspond to the problem at hand yet, so below is speculation). Duh, I looked for a warning, but did not cross reference, and it was not this one that hit yet. Though apparently I am hitting some of the "impossible" warnings, so you might want to look into that anyway. status = nfsd4_process_open2(rqstp, resfh, open); WARN(status && open->op_created, "nfsd4_process_open2 failed to open newly-created file! status=%u\n", be32_to_cpu(status)); and filp = find_readable_file(fp); if (!filp) { /* We should always have a readable file here */ WARN_ON_ONCE(1); locks_free_lock(fl); return -EBADF; } ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Files leak from nfsd in 4.7.1-rc1 (and more?) 2016-06-08 2:22 ` Oleg Drokin 2016-06-08 3:55 ` Oleg Drokin @ 2016-06-08 10:58 ` Jeff Layton 2016-06-08 14:44 ` Oleg Drokin ` (2 more replies) 1 sibling, 3 replies; 45+ messages in thread From: Jeff Layton @ 2016-06-08 10:58 UTC (permalink / raw) To: Oleg Drokin Cc: J. Bruce Fields, linux-nfs, <linux-kernel@vger.kernel.org> Mailing List On Tue, 2016-06-07 at 22:22 -0400, Oleg Drokin wrote: > On Jun 7, 2016, at 8:03 PM, Jeff Layton wrote: > > > > > That said, this code is quite subtle. I'd need to look over it in more > > > > detail before I offer up any fixes. I'd also appreciate it if anyone > > > > else wants to sanity check my analysis there. > > > > > > Yeah, I think you're right. It's fine since r/w opens have a distinct > > slot, even though the refcounting just tracks the number of read and > > write references. So yeah, the leak probably is in an error path > > someplace, or maybe a race someplace. > > So I noticed that set_access is always called locked, but clear_access is not, > this does not sound right. > > So I placed this strategic WARN_ON: > @@ -3991,6 +4030,7 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, > goto out_put_access; > spin_lock(&fp->fi_lock); > if (!fp->fi_fds[oflag]) { > +WARN_ON(!test_access(open->op_share_access, stp)); > fp->fi_fds[oflag] = filp; > filp = NULL; > > This is right in the place where nfsd set the access flag already, discovered > that the file is not opened and went on to open it, yet some parallel thread > came in and cleared the flag by the time we got the file opened. > It did trigger (but there are 30 minutes left till test finish, so I don't > know yet if this will correspond to the problem at hand yet, so below is speculation). > > Now, at the exit from this function, the state will not have access for this file > set and the file would be leaked, since the matching call, probably in > release_all_access() does: > if (test_access(i, stp)) > nfs4_file_put_access(stp->st_stid.sc_file, i); > clear_access(i, stp); > > Meaning that the refcount is not dropped due to no access flag set. > > I imagine we can just set the flag back right after setting fp->fi_fds[oflag] > and that particular race would be gone? > > Would that be ok, or is it just wrong that somebody can come and drop > (delegation downgrade seems to be the most likely suspect) access bits like this > while they are in the middle of being acquired? > I think you're on the right track... Operations that morph the stateid's seqid should be serialized via the st_rwsem. An OPEN_DOWNGRADE or CLOSE takes an exclusive lock on the rwsem, but we do allow OPENs to run in parallel with each other, under the assumption that on success you always get at least what you requested. So, I don't think that OPEN can really race with an OPEN_DOWNGRADE or CLOSE like you're thinking. But...suppose we have two racing OPEN calls. They're both in nfs4_get_vfs_file. One opens the file and succeeds and the other fails and ends up in out_put_access. At that point, you could end up clobbering the successful update to st_access_bmap from the other task, and we'd end up not putting the file access references in release_all_access. A simple way to confirm that might be to convert all of the read locks on the st_rwsem to write locks. That will serialize all of the open operations and should prevent that particular race from occurring. If that works, we'd probably want to fix it in a less heavy-handed way, but I'd have to think about how best to do that. -- Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Files leak from nfsd in 4.7.1-rc1 (and more?) 2016-06-08 10:58 ` Jeff Layton @ 2016-06-08 14:44 ` Oleg Drokin 2016-06-08 16:10 ` Oleg Drokin 2016-06-09 12:13 ` Files leak from nfsd in 4.7.1-rc1 (and more?) Andrew W Elble 2 siblings, 0 replies; 45+ messages in thread From: Oleg Drokin @ 2016-06-08 14:44 UTC (permalink / raw) To: Jeff Layton Cc: J. Bruce Fields, linux-nfs, <linux-kernel@vger.kernel.org> Mailing List On Jun 8, 2016, at 6:58 AM, Jeff Layton wrote: > On Tue, 2016-06-07 at 22:22 -0400, Oleg Drokin wrote: >> On Jun 7, 2016, at 8:03 PM, Jeff Layton wrote: >> >>>>> That said, this code is quite subtle. I'd need to look over it in more >>>>> detail before I offer up any fixes. I'd also appreciate it if anyone >>>>> else wants to sanity check my analysis there. >>>>> >>> Yeah, I think you're right. It's fine since r/w opens have a distinct >>> slot, even though the refcounting just tracks the number of read and >>> write references. So yeah, the leak probably is in an error path >>> someplace, or maybe a race someplace. >> >> So I noticed that set_access is always called locked, but clear_access is not, >> this does not sound right. >> >> So I placed this strategic WARN_ON: >> @@ -3991,6 +4030,7 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, >> goto out_put_access; >> spin_lock(&fp->fi_lock); >> if (!fp->fi_fds[oflag]) { >> +WARN_ON(!test_access(open->op_share_access, stp)); >> fp->fi_fds[oflag] = filp; >> filp = NULL; >> >> This is right in the place where nfsd set the access flag already, discovered >> that the file is not opened and went on to open it, yet some parallel thread >> came in and cleared the flag by the time we got the file opened. >> It did trigger (but there are 30 minutes left till test finish, so I don't >> know yet if this will correspond to the problem at hand yet, so below is speculation). >> >> Now, at the exit from this function, the state will not have access for this file >> set and the file would be leaked, since the matching call, probably in >> release_all_access() does: >> if (test_access(i, stp)) >> nfs4_file_put_access(stp->st_stid.sc_file, i); >> clear_access(i, stp); >> >> Meaning that the refcount is not dropped due to no access flag set. >> >> I imagine we can just set the flag back right after setting fp->fi_fds[oflag] >> and that particular race would be gone? >> >> Would that be ok, or is it just wrong that somebody can come and drop >> (delegation downgrade seems to be the most likely suspect) access bits like this >> while they are in the middle of being acquired? >> > > I think you're on the right track... > > Operations that morph the stateid's seqid should be serialized via the > st_rwsem. An OPEN_DOWNGRADE or CLOSE takes an exclusive lock on the > rwsem, but we do allow OPENs to run in parallel with each other, under > the assumption that on success you always get at least what you > requested. So, I don't think that OPEN can really race with an > OPEN_DOWNGRADE or CLOSE like you're thinking. > > But...suppose we have two racing OPEN calls. They're both in > nfs4_get_vfs_file. One opens the file and succeeds and the other fails > and ends up in out_put_access. At that point, you could end up > clobbering the successful update to st_access_bmap from the other task, > and we'd end up not putting the file access references in > release_all_access. It looks like the race plays in reverse to what I thought. My assert when setting fi_fds to not NULL never hit, but I also had this warning below and that DID hit and is a 100% correlation to the later leak. set_access(u32 access, struct nfs4_ol_stateid *stp) { unsigned char mask = 1 << access; WARN_ON_ONCE(access > NFS4_SHARE_ACCESS_BOTH); WARN_ON(stp->st_access_bmap & mask); stp->st_access_bmap |= mask; } So we are setting the access that is already set (this comes from nfs4_get_vfs_file() via nfs4_upgrade_open() ) twice, then one of the callers drops theirs, and so subsequent state is not dropping their file reference in release_all_access() because they don't see the access bit, something like this. Though I don't see why it cannot play like I originally envisioned too. Hm… So if it always hits when set_access warning triggers, and there's no error on both paths, then there's no explicit call to nfs4_file_put_access() and that's a leak since release_all_access() would only release the file once? I wonder if instead on the success exit path we can check the old_access_bmap and if there our current access is also set - release it? This is ok, right? if the other caller succeeded they still keep the reference. If the other caller failed they drop theirs and restore their idea of access bmap which presumably does not have the bit set and drop their reference anyway. The question is if they both fail, or only them fail. If both fail and we win the race on restoring old access - we might set access wrongly because them had it correct. Similarly if they fail and we don't, they can reset our access. > A simple way to confirm that might be to convert all of the read locks > on the st_rwsem to write locks. That will serialize all of the open > operations and should prevent that particular race from occurring. > > If that works, we'd probably want to fix it in a less heavy-handed way, > but I'd have to think about how best to do that. Indeed this sounds a bit too heavy handed, but I'll give it a try just to make sure. Thanks. Bye, Oleg ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Files leak from nfsd in 4.7.1-rc1 (and more?) 2016-06-08 10:58 ` Jeff Layton 2016-06-08 14:44 ` Oleg Drokin @ 2016-06-08 16:10 ` Oleg Drokin 2016-06-08 17:22 ` Jeff Layton 2016-06-09 12:13 ` Files leak from nfsd in 4.7.1-rc1 (and more?) Andrew W Elble 2 siblings, 1 reply; 45+ messages in thread From: Oleg Drokin @ 2016-06-08 16:10 UTC (permalink / raw) To: Jeff Layton Cc: J. Bruce Fields, linux-nfs, <linux-kernel@vger.kernel.org> Mailing List On Jun 8, 2016, at 6:58 AM, Jeff Layton wrote: > A simple way to confirm that might be to convert all of the read locks > on the st_rwsem to write locks. That will serialize all of the open > operations and should prevent that particular race from occurring. > > If that works, we'd probably want to fix it in a less heavy-handed way, > but I'd have to think about how best to do that. So I looked at the call sites for nfs4_get_vfs_file(), how about something like this: after we grab the fp->fi_lock, we can do test_access(open->op_share_access, stp); If that returns true - just drop the spinlock and return EAGAIN. The callsite in nfs4_upgrade_open() would handle that by retesting the access map again and either coming back in or more likely reusing the now updated stateid (synchronised by the fi_lock again). We probably need to convert the whole access map testing there to be under fi_lock. Something like: nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp, struct nfsd4_open *open) { __be32 status; unsigned char old_deny_bmap = stp->st_deny_bmap; again: + spin_lock(&fp->fi_lock); if (!test_access(open->op_share_access, stp)) { + spin_unlock(&fp->fi_lock); + status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open); + if (status == -EAGAIN) + goto again; + return status; + } /* test and set deny mode */ - spin_lock(&fp->fi_lock); status = nfs4_file_check_deny(fp, open->op_share_deny); The call in nfsd4_process_open2() I think cannot hit this condition, right? probably can add a WARN_ON there? BUG_ON? more sensible approach? Alternatively we can probably always call nfs4_get_vfs_file() under this spinlock, just have it drop that for the open and then reobtain (already done), not as transparent I guess. Or the fi_lock might be converted to say a mutex, so we can sleep with it held and then we can hold it across whole invocation of nfs4_get_vfs_file() and access testing and stuff. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Files leak from nfsd in 4.7.1-rc1 (and more?) 2016-06-08 16:10 ` Oleg Drokin @ 2016-06-08 17:22 ` Jeff Layton 2016-06-08 17:37 ` Oleg Drokin ` (2 more replies) 0 siblings, 3 replies; 45+ messages in thread From: Jeff Layton @ 2016-06-08 17:22 UTC (permalink / raw) To: Oleg Drokin Cc: J. Bruce Fields, linux-nfs, <linux-kernel@vger.kernel.org> Mailing List On Wed, 2016-06-08 at 12:10 -0400, Oleg Drokin wrote: > On Jun 8, 2016, at 6:58 AM, Jeff Layton wrote: > > > A simple way to confirm that might be to convert all of the read locks > > on the st_rwsem to write locks. That will serialize all of the open > > operations and should prevent that particular race from occurring. > > > > If that works, we'd probably want to fix it in a less heavy-handed way, > > but I'd have to think about how best to do that. > > So I looked at the call sites for nfs4_get_vfs_file(), how about something like this: > > after we grab the fp->fi_lock, we can do test_access(open->op_share_access, stp); > > If that returns true - just drop the spinlock and return EAGAIN. > > The callsite in nfs4_upgrade_open() would handle that by retesting the access map > again and either coming back in or more likely reusing the now updated stateid > (synchronised by the fi_lock again). > We probably need to convert the whole access map testing there to be under > fi_lock. > Something like: > nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp, struct nfsd4_open *open) > { > __be32 status; > unsigned char old_deny_bmap = stp->st_deny_bmap; > > again: > + spin_lock(&fp->fi_lock); > if (!test_access(open->op_share_access, stp)) { > + spin_unlock(&fp->fi_lock); > + status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open); > + if (status == -EAGAIN) > + goto again; > + return status; > + } > > /* test and set deny mode */ > - spin_lock(&fp->fi_lock); > status = nfs4_file_check_deny(fp, open->op_share_deny); > > > The call in nfsd4_process_open2() I think cannot hit this condition, right? > probably can add a WARN_ON there? BUG_ON? more sensible approach? > > Alternatively we can probably always call nfs4_get_vfs_file() under this spinlock, > just have it drop that for the open and then reobtain (already done), not as transparent I guess. > Yeah, I think that might be best. It looks like things could change after you drop the spinlock with the patch above. Since we have to retake it anyway in nfs4_get_vfs_file, we can just do it there. > Or the fi_lock might be converted to say a mutex, so we can sleep with it held and > then we can hold it across whole invocation of nfs4_get_vfs_file() and access testing and stuff. I think we'd be better off taking the st_rwsem for write (maybe just turning it into a mutex). That would at least be per-stateid instead of per-inode. That's a fine fix for now. It might slow down a client slightly that is sending two stateid morphing operations in parallel, but they shouldn't affect each other. I'm liking that solution more and more here. Longer term, I think we need to further simplify OPEN handling. It has gotten better, but it's still really hard to follow currently (and is obviously error-prone). -- Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Files leak from nfsd in 4.7.1-rc1 (and more?) 2016-06-08 17:22 ` Jeff Layton @ 2016-06-08 17:37 ` Oleg Drokin 2016-06-09 2:55 ` [PATCH] nfsd: Always lock state exclusively Oleg Drokin 2016-06-09 21:01 ` [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file Oleg Drokin 2 siblings, 0 replies; 45+ messages in thread From: Oleg Drokin @ 2016-06-08 17:37 UTC (permalink / raw) To: Jeff Layton Cc: J. Bruce Fields, linux-nfs, <linux-kernel@vger.kernel.org> Mailing List On Jun 8, 2016, at 1:22 PM, Jeff Layton wrote: > On Wed, 2016-06-08 at 12:10 -0400, Oleg Drokin wrote: >> On Jun 8, 2016, at 6:58 AM, Jeff Layton wrote: >> >>> A simple way to confirm that might be to convert all of the read locks >>> on the st_rwsem to write locks. That will serialize all of the open >>> operations and should prevent that particular race from occurring. >>> >>> If that works, we'd probably want to fix it in a less heavy-handed way, >>> but I'd have to think about how best to do that. >> >> So I looked at the call sites for nfs4_get_vfs_file(), how about something like this: >> >> after we grab the fp->fi_lock, we can do test_access(open->op_share_access, stp); >> >> If that returns true - just drop the spinlock and return EAGAIN. >> >> The callsite in nfs4_upgrade_open() would handle that by retesting the access map >> again and either coming back in or more likely reusing the now updated stateid >> (synchronised by the fi_lock again). >> We probably need to convert the whole access map testing there to be under >> fi_lock. >> Something like: >> nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp, struct nfsd4_open *open) >> { >> __be32 status; >> unsigned char old_deny_bmap = stp->st_deny_bmap; >> >> again: >> + spin_lock(&fp->fi_lock); >> if (!test_access(open->op_share_access, stp)) { >> + spin_unlock(&fp->fi_lock); >> + status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open); >> + if (status == -EAGAIN) >> + goto again; >> + return status; >> + } >> >> /* test and set deny mode */ >> - spin_lock(&fp->fi_lock); >> status = nfs4_file_check_deny(fp, open->op_share_deny); >> >> >> The call in nfsd4_process_open2() I think cannot hit this condition, right? >> probably can add a WARN_ON there? BUG_ON? more sensible approach? >> >> Alternatively we can probably always call nfs4_get_vfs_file() under this spinlock, >> just have it drop that for the open and then reobtain (already done), not as transparent I guess. >> > > Yeah, I think that might be best. It looks like things could change > after you drop the spinlock with the patch above. Since we have to > retake it anyway in nfs4_get_vfs_file, we can just do it there. > >> Or the fi_lock might be converted to say a mutex, so we can sleep with it held and >> then we can hold it across whole invocation of nfs4_get_vfs_file() and access testing and stuff. > > I think we'd be better off taking the st_rwsem for write (maybe just > turning it into a mutex). That would at least be per-stateid instead of > per-inode. That's a fine fix for now. > > It might slow down a client slightly that is sending two stateid > morphing operations in parallel, but they shouldn't affect each other. > I'm liking that solution more and more here. > Longer term, I think we need to further simplify OPEN handling. It has > gotten better, but it's still really hard to follow currently (and is > obviously error-prone). The conversion to always rwlock holds up nice so far (also no other WARNs are triggered yet.) I guess I'll do a patch converting to mutex, but also separately a patch that just holds fi_lock more - test that other one and if all is well, submit is too, and let you choose which one you like the most ;) ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH] nfsd: Always lock state exclusively. 2016-06-08 17:22 ` Jeff Layton 2016-06-08 17:37 ` Oleg Drokin @ 2016-06-09 2:55 ` Oleg Drokin 2016-06-09 10:13 ` Jeff Layton 2016-06-09 21:01 ` [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file Oleg Drokin 2 siblings, 1 reply; 45+ messages in thread From: Oleg Drokin @ 2016-06-09 2:55 UTC (permalink / raw) To: Jeff Layton, J . Bruce Fields; +Cc: linux-nfs, linux-kernel, Oleg Drokin It used to be the case that state had an rwlock that was locked for write by downgrades, but for read for upgrades (opens). Well, the problem is if there are two competing opens for the same state, they step on each other toes potentially leading to leaking file descriptors from the state structure, since access mode is a bitmap only set once. This patch converts st_rwsem to st_mutex and all users become exclusive, no sharing Signed-off-by: Oleg Drokin <green@linuxhacker.ru> --- This holds up well in my testing. I'll also try the other approach to see if it's any better. fs/nfsd/nfs4state.c | 40 ++++++++++++++++++++-------------------- fs/nfsd/state.h | 2 +- 2 files changed, 21 insertions(+), 21 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index f5f82e1..c927d36 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3502,7 +3502,7 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, stp->st_access_bmap = 0; stp->st_deny_bmap = 0; stp->st_openstp = NULL; - init_rwsem(&stp->st_rwsem); + mutex_init(&stp->st_mutex); list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids); list_add(&stp->st_perfile, &fp->fi_stateids); @@ -4335,10 +4335,10 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf */ if (stp) { /* Stateid was found, this is an OPEN upgrade */ - down_read(&stp->st_rwsem); + mutex_lock(&stp->st_mutex); status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open); if (status) { - up_read(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); goto out; } } else { @@ -4348,19 +4348,19 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf if (swapstp) { nfs4_put_stid(&stp->st_stid); stp = swapstp; - down_read(&stp->st_rwsem); + mutex_lock(&stp->st_mutex); status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open); if (status) { - up_read(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); goto out; } goto upgrade_out; } - down_read(&stp->st_rwsem); + mutex_lock(&stp->st_mutex); status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open); if (status) { - up_read(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); release_open_stateid(stp); goto out; } @@ -4372,7 +4372,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf } upgrade_out: nfs4_inc_and_copy_stateid(&open->op_stateid, &stp->st_stid); - up_read(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); if (nfsd4_has_session(&resp->cstate)) { if (open->op_deleg_want & NFS4_SHARE_WANT_NO_DELEG) { @@ -4977,12 +4977,12 @@ static __be32 nfs4_seqid_op_checks(struct nfsd4_compound_state *cstate, stateid_ * revoked delegations are kept only for free_stateid. */ return nfserr_bad_stateid; - down_write(&stp->st_rwsem); + mutex_lock(&stp->st_mutex); status = check_stateid_generation(stateid, &stp->st_stid.sc_stateid, nfsd4_has_session(cstate)); if (status == nfs_ok) status = nfs4_check_fh(current_fh, &stp->st_stid); if (status != nfs_ok) - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); return status; } @@ -5030,7 +5030,7 @@ static __be32 nfs4_preprocess_confirmed_seqid_op(struct nfsd4_compound_state *cs return status; oo = openowner(stp->st_stateowner); if (!(oo->oo_flags & NFS4_OO_CONFIRMED)) { - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); nfs4_put_stid(&stp->st_stid); return nfserr_bad_stateid; } @@ -5062,12 +5062,12 @@ nfsd4_open_confirm(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, oo = openowner(stp->st_stateowner); status = nfserr_bad_stateid; if (oo->oo_flags & NFS4_OO_CONFIRMED) { - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); goto put_stateid; } oo->oo_flags |= NFS4_OO_CONFIRMED; nfs4_inc_and_copy_stateid(&oc->oc_resp_stateid, &stp->st_stid); - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); dprintk("NFSD: %s: success, seqid=%d stateid=" STATEID_FMT "\n", __func__, oc->oc_seqid, STATEID_VAL(&stp->st_stid.sc_stateid)); @@ -5143,7 +5143,7 @@ nfsd4_open_downgrade(struct svc_rqst *rqstp, nfs4_inc_and_copy_stateid(&od->od_stateid, &stp->st_stid); status = nfs_ok; put_stateid: - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); nfs4_put_stid(&stp->st_stid); out: nfsd4_bump_seqid(cstate, status); @@ -5196,7 +5196,7 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) goto out; nfs4_inc_and_copy_stateid(&close->cl_stateid, &stp->st_stid); - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); nfsd4_close_open_stateid(stp); @@ -5422,7 +5422,7 @@ init_lock_stateid(struct nfs4_ol_stateid *stp, struct nfs4_lockowner *lo, stp->st_access_bmap = 0; stp->st_deny_bmap = open_stp->st_deny_bmap; stp->st_openstp = open_stp; - init_rwsem(&stp->st_rwsem); + mutex_init(&stp->st_mutex); list_add(&stp->st_locks, &open_stp->st_locks); list_add(&stp->st_perstateowner, &lo->lo_owner.so_stateids); spin_lock(&fp->fi_lock); @@ -5591,7 +5591,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, &open_stp, nn); if (status) goto out; - up_write(&open_stp->st_rwsem); + mutex_unlock(&open_stp->st_mutex); open_sop = openowner(open_stp->st_stateowner); status = nfserr_bad_stateid; if (!same_clid(&open_sop->oo_owner.so_client->cl_clientid, @@ -5600,7 +5600,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = lookup_or_create_lock_state(cstate, open_stp, lock, &lock_stp, &new); if (status == nfs_ok) - down_write(&lock_stp->st_rwsem); + mutex_lock(&lock_stp->st_mutex); } else { status = nfs4_preprocess_seqid_op(cstate, lock->lk_old_lock_seqid, @@ -5704,7 +5704,7 @@ out: seqid_mutating_err(ntohl(status))) lock_sop->lo_owner.so_seqid++; - up_write(&lock_stp->st_rwsem); + mutex_unlock(&lock_stp->st_mutex); /* * If this is a new, never-before-used stateid, and we are @@ -5874,7 +5874,7 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, fput: fput(filp); put_stateid: - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); nfs4_put_stid(&stp->st_stid); out: nfsd4_bump_seqid(cstate, status); diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 986e51e..64053ea 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -535,7 +535,7 @@ struct nfs4_ol_stateid { unsigned char st_access_bmap; unsigned char st_deny_bmap; struct nfs4_ol_stateid *st_openstp; - struct rw_semaphore st_rwsem; + struct mutex st_mutex; }; static inline struct nfs4_ol_stateid *openlockstateid(struct nfs4_stid *s) -- 2.7.4 ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH] nfsd: Always lock state exclusively. 2016-06-09 2:55 ` [PATCH] nfsd: Always lock state exclusively Oleg Drokin @ 2016-06-09 10:13 ` Jeff Layton 0 siblings, 0 replies; 45+ messages in thread From: Jeff Layton @ 2016-06-09 10:13 UTC (permalink / raw) To: Oleg Drokin, J . Bruce Fields; +Cc: linux-nfs, linux-kernel On Wed, 2016-06-08 at 22:55 -0400, Oleg Drokin wrote: > It used to be the case that state had an rwlock that was locked for write > by downgrades, but for read for upgrades (opens). Well, the problem is > if there are two competing opens for the same state, they step on > each other toes potentially leading to leaking file descriptors > from the state structure, since access mode is a bitmap only set once. > This patch converts st_rwsem to st_mutex and all users become exclusive, > no sharing > > Signed-off-by: Oleg Drokin <green@linuxhacker.ru> > --- > This holds up well in my testing. > I'll also try the other approach to see if it's any better. > fs/nfsd/nfs4state.c | 40 ++++++++++++++++++++-------------------- > fs/nfsd/state.h | 2 +- > 2 files changed, 21 insertions(+), 21 deletions(-) > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index f5f82e1..c927d36 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -3502,7 +3502,7 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, > stp->st_access_bmap = 0; > stp->st_deny_bmap = 0; > stp->st_openstp = NULL; > - init_rwsem(&stp->st_rwsem); > + mutex_init(&stp->st_mutex); > list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids); > list_add(&stp->st_perfile, &fp->fi_stateids); > > @@ -4335,10 +4335,10 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf > */ > if (stp) { > /* Stateid was found, this is an OPEN upgrade */ > - down_read(&stp->st_rwsem); > + mutex_lock(&stp->st_mutex); > status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open); > if (status) { > - up_read(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > goto out; > } > } else { > @@ -4348,19 +4348,19 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf > if (swapstp) { > nfs4_put_stid(&stp->st_stid); > stp = swapstp; > - down_read(&stp->st_rwsem); > + mutex_lock(&stp->st_mutex); > status = nfs4_upgrade_open(rqstp, fp, current_fh, > stp, open); > if (status) { > - up_read(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > goto out; > } > goto upgrade_out; > } > - down_read(&stp->st_rwsem); > + mutex_lock(&stp->st_mutex); > status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open); > if (status) { > - up_read(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > release_open_stateid(stp); > goto out; > } > @@ -4372,7 +4372,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf > } > upgrade_out: > nfs4_inc_and_copy_stateid(&open->op_stateid, &stp->st_stid); > - up_read(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > > if (nfsd4_has_session(&resp->cstate)) { > if (open->op_deleg_want & NFS4_SHARE_WANT_NO_DELEG) { > @@ -4977,12 +4977,12 @@ static __be32 nfs4_seqid_op_checks(struct nfsd4_compound_state *cstate, stateid_ > * revoked delegations are kept only for free_stateid. > */ > return nfserr_bad_stateid; > - down_write(&stp->st_rwsem); > + mutex_lock(&stp->st_mutex); > status = check_stateid_generation(stateid, &stp->st_stid.sc_stateid, nfsd4_has_session(cstate)); > if (status == nfs_ok) > status = nfs4_check_fh(current_fh, &stp->st_stid); > if (status != nfs_ok) > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > return status; > } > > @@ -5030,7 +5030,7 @@ static __be32 nfs4_preprocess_confirmed_seqid_op(struct nfsd4_compound_state *cs > return status; > oo = openowner(stp->st_stateowner); > if (!(oo->oo_flags & NFS4_OO_CONFIRMED)) { > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > nfs4_put_stid(&stp->st_stid); > return nfserr_bad_stateid; > } > @@ -5062,12 +5062,12 @@ nfsd4_open_confirm(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > oo = openowner(stp->st_stateowner); > status = nfserr_bad_stateid; > if (oo->oo_flags & NFS4_OO_CONFIRMED) { > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > goto put_stateid; > } > oo->oo_flags |= NFS4_OO_CONFIRMED; > nfs4_inc_and_copy_stateid(&oc->oc_resp_stateid, &stp->st_stid); > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > dprintk("NFSD: %s: success, seqid=%d stateid=" STATEID_FMT "\n", > __func__, oc->oc_seqid, STATEID_VAL(&stp->st_stid.sc_stateid)); > > @@ -5143,7 +5143,7 @@ nfsd4_open_downgrade(struct svc_rqst *rqstp, > nfs4_inc_and_copy_stateid(&od->od_stateid, &stp->st_stid); > status = nfs_ok; > put_stateid: > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > nfs4_put_stid(&stp->st_stid); > out: > nfsd4_bump_seqid(cstate, status); > @@ -5196,7 +5196,7 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > if (status) > goto out; > nfs4_inc_and_copy_stateid(&close->cl_stateid, &stp->st_stid); > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > > nfsd4_close_open_stateid(stp); > > @@ -5422,7 +5422,7 @@ init_lock_stateid(struct nfs4_ol_stateid *stp, struct nfs4_lockowner *lo, > stp->st_access_bmap = 0; > stp->st_deny_bmap = open_stp->st_deny_bmap; > stp->st_openstp = open_stp; > - init_rwsem(&stp->st_rwsem); > + mutex_init(&stp->st_mutex); > list_add(&stp->st_locks, &open_stp->st_locks); > list_add(&stp->st_perstateowner, &lo->lo_owner.so_stateids); > spin_lock(&fp->fi_lock); > @@ -5591,7 +5591,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > &open_stp, nn); > if (status) > goto out; > - up_write(&open_stp->st_rwsem); > + mutex_unlock(&open_stp->st_mutex); > open_sop = openowner(open_stp->st_stateowner); > status = nfserr_bad_stateid; > if (!same_clid(&open_sop->oo_owner.so_client->cl_clientid, > @@ -5600,7 +5600,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > status = lookup_or_create_lock_state(cstate, open_stp, lock, > &lock_stp, &new); > if (status == nfs_ok) > - down_write(&lock_stp->st_rwsem); > + mutex_lock(&lock_stp->st_mutex); > } else { > status = nfs4_preprocess_seqid_op(cstate, > lock->lk_old_lock_seqid, > @@ -5704,7 +5704,7 @@ out: > seqid_mutating_err(ntohl(status))) > lock_sop->lo_owner.so_seqid++; > > - up_write(&lock_stp->st_rwsem); > + mutex_unlock(&lock_stp->st_mutex); > > /* > * If this is a new, never-before-used stateid, and we are > @@ -5874,7 +5874,7 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > fput: > fput(filp); > put_stateid: > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > nfs4_put_stid(&stp->st_stid); > out: > nfsd4_bump_seqid(cstate, status); > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h > index 986e51e..64053ea 100644 > --- a/fs/nfsd/state.h > +++ b/fs/nfsd/state.h > @@ -535,7 +535,7 @@ struct nfs4_ol_stateid { > unsigned char st_access_bmap; > unsigned char st_deny_bmap; > struct nfs4_ol_stateid *st_openstp; > - struct rw_semaphore st_rwsem; > + struct mutex st_mutex; > }; > > static inline struct nfs4_ol_stateid *openlockstateid(struct nfs4_stid *s)Looks legit. Eventually we might want to turn this back into a rwsem by fixing and clarifying the locking around the st_access_bmap (and the deny bmap), but for now I think this is a reasonable fix for the problem Oleg found.There is the potential for a minor perf hit here when there are racing OPEN calls for the same inode, but I think that's acceptable for now. We may also want to go ahead and send this to stable as well.Reviewed-by: Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file 2016-06-08 17:22 ` Jeff Layton 2016-06-08 17:37 ` Oleg Drokin 2016-06-09 2:55 ` [PATCH] nfsd: Always lock state exclusively Oleg Drokin @ 2016-06-09 21:01 ` Oleg Drokin 2016-06-10 4:18 ` Oleg Drokin 2 siblings, 1 reply; 45+ messages in thread From: Oleg Drokin @ 2016-06-09 21:01 UTC (permalink / raw) To: Jeff Layton, J . Bruce Fields; +Cc: linux-nfs, linux-kernel, Oleg Drokin Currently there's an unprotected access mode check in nfs4_upgrade_open that then calls nfs4_get_vfs_file which in turn assumes whatever access mode was present in the state is still valid which is racy. Two nfs4_get_vfs_file van enter the same path as result and get two references to nfs4_file, but later drop would only happens once because access mode is only denoted by bits, so no refcounting. The locking around access mode testing is introduced to avoid this race. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> --- This patch performs equally well to the st_rwsem -> mutex conversion, but is a bit ligher-weight I imagine. For one it seems to allow truncates in parallel if we ever want it. fs/nfsd/nfs4state.c | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index f5f82e1..d4b9eba 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3958,6 +3958,11 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, spin_lock(&fp->fi_lock); + if (test_access(open->op_share_access, stp)) { + spin_unlock(&fp->fi_lock); + return nfserr_eagain; + } + /* * Are we trying to set a deny mode that would conflict with * current access? @@ -4017,11 +4022,21 @@ nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, struct svc_fh *c __be32 status; unsigned char old_deny_bmap = stp->st_deny_bmap; - if (!test_access(open->op_share_access, stp)) - return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open); +again: + spin_lock(&fp->fi_lock); + if (!test_access(open->op_share_access, stp)) { + spin_unlock(&fp->fi_lock); + status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open); + /* + * Somebody won the race for access while we did not hold + * the lock here + */ + if (status == nfserr_eagain) + goto again; + return status; + } /* test and set deny mode */ - spin_lock(&fp->fi_lock); status = nfs4_file_check_deny(fp, open->op_share_deny); if (status == nfs_ok) { set_deny(open->op_share_deny, stp); @@ -4361,6 +4376,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open); if (status) { up_read(&stp->st_rwsem); + /* + * EAGAIN is returned when there's a racing access, + * this should never happen as we are the only user + * of this new state, and since it's not yet hashed, + * nobody can find it + */ + WARN_ON(status == nfserr_eagain); release_open_stateid(stp); goto out; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file 2016-06-09 21:01 ` [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file Oleg Drokin @ 2016-06-10 4:18 ` Oleg Drokin 2016-06-10 10:50 ` Jeff Layton 0 siblings, 1 reply; 45+ messages in thread From: Oleg Drokin @ 2016-06-10 4:18 UTC (permalink / raw) To: Jeff Layton, J . Bruce Fields; +Cc: linux-nfs, linux-kernel On Jun 9, 2016, at 5:01 PM, Oleg Drokin wrote: > Currently there's an unprotected access mode check in nfs4_upgrade_open > that then calls nfs4_get_vfs_file which in turn assumes whatever > access mode was present in the state is still valid which is racy. > Two nfs4_get_vfs_file van enter the same path as result and get two > references to nfs4_file, but later drop would only happens once because > access mode is only denoted by bits, so no refcounting. > > The locking around access mode testing is introduced to avoid this race. > > Signed-off-by: Oleg Drokin <green@linuxhacker.ru> > --- > > This patch performs equally well to the st_rwsem -> mutex conversion, > but is a bit ligher-weight I imagine. > For one it seems to allow truncates in parallel if we ever want it. > > fs/nfsd/nfs4state.c | 28 +++++++++++++++++++++++++--- > 1 file changed, 25 insertions(+), 3 deletions(-) > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index f5f82e1..d4b9eba 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -3958,6 +3958,11 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, > > spin_lock(&fp->fi_lock); > > + if (test_access(open->op_share_access, stp)) { > + spin_unlock(&fp->fi_lock); > + return nfserr_eagain; > + } > + > /* > * Are we trying to set a deny mode that would conflict with > * current access? > @@ -4017,11 +4022,21 @@ nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, struct svc_fh *c > __be32 status; > unsigned char old_deny_bmap = stp->st_deny_bmap; > > - if (!test_access(open->op_share_access, stp)) > - return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open); > +again: > + spin_lock(&fp->fi_lock); > + if (!test_access(open->op_share_access, stp)) { > + spin_unlock(&fp->fi_lock); > + status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open); > + /* > + * Somebody won the race for access while we did not hold > + * the lock here > + */ > + if (status == nfserr_eagain) > + goto again; > + return status; > + } > > /* test and set deny mode */ > - spin_lock(&fp->fi_lock); > status = nfs4_file_check_deny(fp, open->op_share_deny); > if (status == nfs_ok) { > set_deny(open->op_share_deny, stp); > @@ -4361,6 +4376,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf > status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open); > if (status) { > up_read(&stp->st_rwsem); > + /* > + * EAGAIN is returned when there's a racing access, > + * this should never happen as we are the only user > + * of this new state, and since it's not yet hashed, > + * nobody can find it > + */ > + WARN_ON(status == nfserr_eagain); Ok, some more testing shows that this CAN happen. So this patch is inferior to the mutex one after all. > release_open_stateid(stp); > goto out; > } > -- > 2.7.4 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file 2016-06-10 4:18 ` Oleg Drokin @ 2016-06-10 10:50 ` Jeff Layton 2016-06-10 20:55 ` J . Bruce Fields 0 siblings, 1 reply; 45+ messages in thread From: Jeff Layton @ 2016-06-10 10:50 UTC (permalink / raw) To: Oleg Drokin, J . Bruce Fields; +Cc: linux-nfs, linux-kernel, Andrew W Elble On Fri, 2016-06-10 at 00:18 -0400, Oleg Drokin wrote: > On Jun 9, 2016, at 5:01 PM, Oleg Drokin wrote: > > > Currently there's an unprotected access mode check in > > nfs4_upgrade_open > > that then calls nfs4_get_vfs_file which in turn assumes whatever > > access mode was present in the state is still valid which is racy. > > Two nfs4_get_vfs_file van enter the same path as result and get two > > references to nfs4_file, but later drop would only happens once > > because > > access mode is only denoted by bits, so no refcounting. > > > > The locking around access mode testing is introduced to avoid this > > race. > > > > Signed-off-by: Oleg Drokin <green@linuxhacker.ru> > > --- > > > > This patch performs equally well to the st_rwsem -> mutex > > conversion, > > but is a bit ligher-weight I imagine. > > For one it seems to allow truncates in parallel if we ever want it. > > > > fs/nfsd/nfs4state.c | 28 +++++++++++++++++++++++++--- > > 1 file changed, 25 insertions(+), 3 deletions(-) > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > index f5f82e1..d4b9eba 100644 > > --- a/fs/nfsd/nfs4state.c > > +++ b/fs/nfsd/nfs4state.c > > @@ -3958,6 +3958,11 @@ static __be32 nfs4_get_vfs_file(struct > > svc_rqst *rqstp, struct nfs4_file *fp, > > > > spin_lock(&fp->fi_lock); > > > > + if (test_access(open->op_share_access, stp)) { > > + spin_unlock(&fp->fi_lock); > > + return nfserr_eagain; > > + } > > + > > /* > > * Are we trying to set a deny mode that would conflict with > > * current access? > > @@ -4017,11 +4022,21 @@ nfs4_upgrade_open(struct svc_rqst *rqstp, > > struct nfs4_file *fp, struct svc_fh *c > > __be32 status; > > unsigned char old_deny_bmap = stp->st_deny_bmap; > > > > - if (!test_access(open->op_share_access, stp)) > > - return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, > > open); > > +again: > > + spin_lock(&fp->fi_lock); > > + if (!test_access(open->op_share_access, stp)) { > > + spin_unlock(&fp->fi_lock); > > + status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, > > open); > > + /* > > + * Somebody won the race for access while we did > > not hold > > + * the lock here > > + */ > > + if (status == nfserr_eagain) > > + goto again; > > + return status; > > + } > > > > /* test and set deny mode */ > > - spin_lock(&fp->fi_lock); > > status = nfs4_file_check_deny(fp, open->op_share_deny); > > if (status == nfs_ok) { > > set_deny(open->op_share_deny, stp); > > @@ -4361,6 +4376,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp, > > struct svc_fh *current_fh, struct nf > > status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, > > open); > > if (status) { > > up_read(&stp->st_rwsem); > > + /* > > + * EAGAIN is returned when there's a > > racing access, > > + * this should never happen as we are the > > only user > > + * of this new state, and since it's not > > yet hashed, > > + * nobody can find it > > + */ > > + WARN_ON(status == nfserr_eagain); > > Ok, some more testing shows that this CAN happen. > So this patch is inferior to the mutex one after all. > Yeah, that can happen for all sorts of reasons. As Andrew pointed out, you can get this when there is a lease break in progress, and that may be occurring for a completely different stateid (or because of samba, etc...) It may be possible to do something like this, but we'd need to audit all of the handling of st_access_bmap (and the deny bmap) to ensure that we get it right. For now, I think just turning that rwsem into a mutex is the best solution. That is a per-stateid mutex so any contention is going to be due to the client sending racing OPEN calls for the same inode anyway. Allowing those to run in parallel again could be useful in some cases, but most use-cases won't be harmed by that serialization. > > release_open_stateid(stp); > > goto out; > > } > > -- > > 2.7.4 > -- Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file 2016-06-10 10:50 ` Jeff Layton @ 2016-06-10 20:55 ` J . Bruce Fields 2016-06-11 15:41 ` Oleg Drokin 0 siblings, 1 reply; 45+ messages in thread From: J . Bruce Fields @ 2016-06-10 20:55 UTC (permalink / raw) To: Jeff Layton; +Cc: Oleg Drokin, linux-nfs, linux-kernel, Andrew W Elble On Fri, Jun 10, 2016 at 06:50:33AM -0400, Jeff Layton wrote: > On Fri, 2016-06-10 at 00:18 -0400, Oleg Drokin wrote: > > On Jun 9, 2016, at 5:01 PM, Oleg Drokin wrote: > > > > > Currently there's an unprotected access mode check in > > > nfs4_upgrade_open > > > that then calls nfs4_get_vfs_file which in turn assumes whatever > > > access mode was present in the state is still valid which is racy. > > > Two nfs4_get_vfs_file van enter the same path as result and get two > > > references to nfs4_file, but later drop would only happens once > > > because > > > access mode is only denoted by bits, so no refcounting. > > > > > > The locking around access mode testing is introduced to avoid this > > > race. > > > > > > Signed-off-by: Oleg Drokin <green@linuxhacker.ru> > > > --- > > > > > > This patch performs equally well to the st_rwsem -> mutex > > > conversion, > > > but is a bit ligher-weight I imagine. > > > For one it seems to allow truncates in parallel if we ever want it. > > > > > > fs/nfsd/nfs4state.c | 28 +++++++++++++++++++++++++--- > > > 1 file changed, 25 insertions(+), 3 deletions(-) > > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > > index f5f82e1..d4b9eba 100644 > > > --- a/fs/nfsd/nfs4state.c > > > +++ b/fs/nfsd/nfs4state.c > > > @@ -3958,6 +3958,11 @@ static __be32 nfs4_get_vfs_file(struct > > > svc_rqst *rqstp, struct nfs4_file *fp, > > > > > > spin_lock(&fp->fi_lock); > > > > > > + if (test_access(open->op_share_access, stp)) { > > > + spin_unlock(&fp->fi_lock); > > > + return nfserr_eagain; > > > + } > > > + > > > /* > > > * Are we trying to set a deny mode that would conflict with > > > * current access? > > > @@ -4017,11 +4022,21 @@ nfs4_upgrade_open(struct svc_rqst *rqstp, > > > struct nfs4_file *fp, struct svc_fh *c > > > __be32 status; > > > unsigned char old_deny_bmap = stp->st_deny_bmap; > > > > > > - if (!test_access(open->op_share_access, stp)) > > > - return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, > > > open); > > > +again: > > > + spin_lock(&fp->fi_lock); > > > + if (!test_access(open->op_share_access, stp)) { > > > + spin_unlock(&fp->fi_lock); > > > + status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, > > > open); > > > + /* > > > + * Somebody won the race for access while we did > > > not hold > > > + * the lock here > > > + */ > > > + if (status == nfserr_eagain) > > > + goto again; > > > + return status; > > > + } > > > > > > /* test and set deny mode */ > > > - spin_lock(&fp->fi_lock); > > > status = nfs4_file_check_deny(fp, open->op_share_deny); > > > if (status == nfs_ok) { > > > set_deny(open->op_share_deny, stp); > > > @@ -4361,6 +4376,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp, > > > struct svc_fh *current_fh, struct nf > > > status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, > > > open); > > > if (status) { > > > up_read(&stp->st_rwsem); > > > + /* > > > + * EAGAIN is returned when there's a > > > racing access, > > > + * this should never happen as we are the > > > only user > > > + * of this new state, and since it's not > > > yet hashed, > > > + * nobody can find it > > > + */ > > > + WARN_ON(status == nfserr_eagain); > > > > Ok, some more testing shows that this CAN happen. > > So this patch is inferior to the mutex one after all. > > > > Yeah, that can happen for all sorts of reasons. As Andrew pointed out, > you can get this when there is a lease break in progress, and that may > be occurring for a completely different stateid (or because of samba, > etc...) > > It may be possible to do something like this, but we'd need to audit > all of the handling of st_access_bmap (and the deny bmap) to ensure > that we get it right. > > For now, I think just turning that rwsem into a mutex is the best > solution. That is a per-stateid mutex so any contention is going to be > due to the client sending racing OPEN calls for the same inode anyway. > Allowing those to run in parallel again could be useful in some cases, > but most use-cases won't be harmed by that serialization. OK, so for now my plan is to take "nfsd: Always lock state exclusively" for 4.7. Thanks to both of you for your work on this.... --b. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file 2016-06-10 20:55 ` J . Bruce Fields @ 2016-06-11 15:41 ` Oleg Drokin 2016-06-12 1:33 ` Jeff Layton 0 siblings, 1 reply; 45+ messages in thread From: Oleg Drokin @ 2016-06-11 15:41 UTC (permalink / raw) To: J . Bruce Fields; +Cc: Jeff Layton, linux-nfs, linux-kernel, Andrew W Elble On Jun 10, 2016, at 4:55 PM, J . Bruce Fields wrote: > On Fri, Jun 10, 2016 at 06:50:33AM -0400, Jeff Layton wrote: >> On Fri, 2016-06-10 at 00:18 -0400, Oleg Drokin wrote: >>> On Jun 9, 2016, at 5:01 PM, Oleg Drokin wrote: >>> >>>> Currently there's an unprotected access mode check in >>>> nfs4_upgrade_open >>>> that then calls nfs4_get_vfs_file which in turn assumes whatever >>>> access mode was present in the state is still valid which is racy. >>>> Two nfs4_get_vfs_file van enter the same path as result and get two >>>> references to nfs4_file, but later drop would only happens once >>>> because >>>> access mode is only denoted by bits, so no refcounting. >>>> >>>> The locking around access mode testing is introduced to avoid this >>>> race. >>>> >>>> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> >>>> --- >>>> >>>> This patch performs equally well to the st_rwsem -> mutex >>>> conversion, >>>> but is a bit ligher-weight I imagine. >>>> For one it seems to allow truncates in parallel if we ever want it. >>>> >>>> fs/nfsd/nfs4state.c | 28 +++++++++++++++++++++++++--- >>>> 1 file changed, 25 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >>>> index f5f82e1..d4b9eba 100644 >>>> --- a/fs/nfsd/nfs4state.c >>>> +++ b/fs/nfsd/nfs4state.c >>>> @@ -3958,6 +3958,11 @@ static __be32 nfs4_get_vfs_file(struct >>>> svc_rqst *rqstp, struct nfs4_file *fp, >>>> >>>> spin_lock(&fp->fi_lock); >>>> >>>> + if (test_access(open->op_share_access, stp)) { >>>> + spin_unlock(&fp->fi_lock); >>>> + return nfserr_eagain; >>>> + } >>>> + >>>> /* >>>> * Are we trying to set a deny mode that would conflict with >>>> * current access? >>>> @@ -4017,11 +4022,21 @@ nfs4_upgrade_open(struct svc_rqst *rqstp, >>>> struct nfs4_file *fp, struct svc_fh *c >>>> __be32 status; >>>> unsigned char old_deny_bmap = stp->st_deny_bmap; >>>> >>>> - if (!test_access(open->op_share_access, stp)) >>>> - return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, >>>> open); >>>> +again: >>>> + spin_lock(&fp->fi_lock); >>>> + if (!test_access(open->op_share_access, stp)) { >>>> + spin_unlock(&fp->fi_lock); >>>> + status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, >>>> open); >>>> + /* >>>> + * Somebody won the race for access while we did >>>> not hold >>>> + * the lock here >>>> + */ >>>> + if (status == nfserr_eagain) >>>> + goto again; >>>> + return status; >>>> + } >>>> >>>> /* test and set deny mode */ >>>> - spin_lock(&fp->fi_lock); >>>> status = nfs4_file_check_deny(fp, open->op_share_deny); >>>> if (status == nfs_ok) { >>>> set_deny(open->op_share_deny, stp); >>>> @@ -4361,6 +4376,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp, >>>> struct svc_fh *current_fh, struct nf >>>> status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, >>>> open); >>>> if (status) { >>>> up_read(&stp->st_rwsem); >>>> + /* >>>> + * EAGAIN is returned when there's a >>>> racing access, >>>> + * this should never happen as we are the >>>> only user >>>> + * of this new state, and since it's not >>>> yet hashed, >>>> + * nobody can find it >>>> + */ >>>> + WARN_ON(status == nfserr_eagain); >>> >>> Ok, some more testing shows that this CAN happen. >>> So this patch is inferior to the mutex one after all. >>> >> >> Yeah, that can happen for all sorts of reasons. As Andrew pointed out, >> you can get this when there is a lease break in progress, and that may >> be occurring for a completely different stateid (or because of samba, >> etc...) >> >> It may be possible to do something like this, but we'd need to audit >> all of the handling of st_access_bmap (and the deny bmap) to ensure >> that we get it right. >> >> For now, I think just turning that rwsem into a mutex is the best >> solution. That is a per-stateid mutex so any contention is going to be >> due to the client sending racing OPEN calls for the same inode anyway. >> Allowing those to run in parallel again could be useful in some cases, >> but most use-cases won't be harmed by that serialization. > > OK, so for now my plan is to take "nfsd: Always lock state exclusively" > for 4.7. Thanks to both of you for your work on this…. FYI, I just hit this again with the "Always lock state exclusively" patch too. I hate when that happens. But it is much harder to hit now. the trace is also in the nfs4_get_vfs_file() that's called directly from nfsd4_process_open2(). Otherwise the symptoms are pretty same - first I get the warning in set_access that the flag is already set and then the nfsd4_free_file_rcu() one and then unmount of underlying fs fails. What's strange is I am not sure what else can set the flag. Basically set_access is called from nfs4_get_vfs_file() - under the mutex via nfsd4_process_open2() directly or via nfs4_upgrade_open()..., or from get_lock_access() - without the mutex, but my workload does not do any file locking, so it should not really be hitting, right? Ah! I think I see it. This patch has the same problem as the spinlock moving one: When we call nfs4_get_vfs_file() directly from nfsd4_process_open2(), there's no check for anything, so supposed we take this diret cllign path, there's a mutex_lock() just before the call, but what's to stop another thread from finding this stateid meanwhile and being first with the mutex and then also settign the access mode that our first thread no longer sets? This makes it so that in fact we can never skip the access mode testing unless testing and setting is atomically done under the same lock which it is not now. The patch that extended the coverage of the fi_lock got that right, I did not hit any leaks there, just the "unhandled EAGAIN" warn on, which is wrong in it's own right. The surprising part is that the state that's not yet been through find_or_hash_clnt_odstate could still be found by somebody else? Is it really supposed to work like that? So if we are to check the access mode at all times in nfs4_get_vfs_file(), and return eagain, should we just check for that (all under the same lock if we go with the mutex patch) and call into nfs4_upgrade_open in that case again? Or I guess it's even better if we resurrect the fi_lock coverage extension patch and do it there, as that would mean at least the check and set are atomic wrt locking? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file 2016-06-11 15:41 ` Oleg Drokin @ 2016-06-12 1:33 ` Jeff Layton 2016-06-12 2:06 ` Oleg Drokin 0 siblings, 1 reply; 45+ messages in thread From: Jeff Layton @ 2016-06-12 1:33 UTC (permalink / raw) To: Oleg Drokin, J . Bruce Fields; +Cc: linux-nfs, linux-kernel, Andrew W Elble On Sat, 2016-06-11 at 11:41 -0400, Oleg Drokin wrote: > On Jun 10, 2016, at 4:55 PM, J . Bruce Fields wrote: > > > On Fri, Jun 10, 2016 at 06:50:33AM -0400, Jeff Layton wrote: > > > On Fri, 2016-06-10 at 00:18 -0400, Oleg Drokin wrote: > > > > On Jun 9, 2016, at 5:01 PM, Oleg Drokin wrote: > > > > > > > > > Currently there's an unprotected access mode check in > > > > > nfs4_upgrade_open > > > > > that then calls nfs4_get_vfs_file which in turn assumes whatever > > > > > access mode was present in the state is still valid which is racy. > > > > > Two nfs4_get_vfs_file van enter the same path as result and get two > > > > > references to nfs4_file, but later drop would only happens once > > > > > because > > > > > access mode is only denoted by bits, so no refcounting. > > > > > > > > > > The locking around access mode testing is introduced to avoid this > > > > > race. > > > > > > > > > > Signed-off-by: Oleg Drokin <green@linuxhacker.ru> > > > > > --- > > > > > > > > > > This patch performs equally well to the st_rwsem -> mutex > > > > > conversion, > > > > > but is a bit ligher-weight I imagine. > > > > > For one it seems to allow truncates in parallel if we ever want it. > > > > > > > > > > fs/nfsd/nfs4state.c | 28 +++++++++++++++++++++++++--- > > > > > 1 file changed, 25 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > > > > index f5f82e1..d4b9eba 100644 > > > > > --- a/fs/nfsd/nfs4state.c > > > > > +++ b/fs/nfsd/nfs4state.c > > > > > @@ -3958,6 +3958,11 @@ static __be32 nfs4_get_vfs_file(struct > > > > > svc_rqst *rqstp, struct nfs4_file *fp, > > > > > > > > > > spin_lock(&fp->fi_lock); > > > > > > > > > > + if (test_access(open->op_share_access, stp)) { > > > > > + spin_unlock(&fp->fi_lock); > > > > > + return nfserr_eagain; > > > > > + } > > > > > + > > > > > /* > > > > > * Are we trying to set a deny mode that would conflict with > > > > > * current access? > > > > > @@ -4017,11 +4022,21 @@ nfs4_upgrade_open(struct svc_rqst *rqstp, > > > > > struct nfs4_file *fp, struct svc_fh *c > > > > > __be32 status; > > > > > unsigned char old_deny_bmap = stp->st_deny_bmap; > > > > > > > > > > - if (!test_access(open->op_share_access, stp)) > > > > > - return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, > > > > > open); > > > > > +again: > > > > > + spin_lock(&fp->fi_lock); > > > > > + if (!test_access(open->op_share_access, stp)) { > > > > > + spin_unlock(&fp->fi_lock); > > > > > + status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, > > > > > open); > > > > > + /* > > > > > + * Somebody won the race for access while we did > > > > > not hold > > > > > + * the lock here > > > > > + */ > > > > > + if (status == nfserr_eagain) > > > > > + goto again; > > > > > + return status; > > > > > + } > > > > > > > > > > /* test and set deny mode */ > > > > > - spin_lock(&fp->fi_lock); > > > > > status = nfs4_file_check_deny(fp, open->op_share_deny); > > > > > if (status == nfs_ok) { > > > > > set_deny(open->op_share_deny, stp); > > > > > @@ -4361,6 +4376,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp, > > > > > struct svc_fh *current_fh, struct nf > > > > > status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, > > > > > open); > > > > > if (status) { > > > > > up_read(&stp->st_rwsem); > > > > > + /* > > > > > + * EAGAIN is returned when there's a > > > > > racing access, > > > > > + * this should never happen as we are the > > > > > only user > > > > > + * of this new state, and since it's not > > > > > yet hashed, > > > > > + * nobody can find it > > > > > + */ > > > > > + WARN_ON(status == nfserr_eagain); > > > > > > > > Ok, some more testing shows that this CAN happen. > > > > So this patch is inferior to the mutex one after all. > > > > > > > > > > Yeah, that can happen for all sorts of reasons. As Andrew pointed out, > > > you can get this when there is a lease break in progress, and that may > > > be occurring for a completely different stateid (or because of samba, > > > etc...) > > > > > > It may be possible to do something like this, but we'd need to audit > > > all of the handling of st_access_bmap (and the deny bmap) to ensure > > > that we get it right. > > > > > > For now, I think just turning that rwsem into a mutex is the best > > > solution. That is a per-stateid mutex so any contention is going to be > > > due to the client sending racing OPEN calls for the same inode anyway. > > > Allowing those to run in parallel again could be useful in some cases, > > > but most use-cases won't be harmed by that serialization. > > > > OK, so for now my plan is to take "nfsd: Always lock state exclusively" > > for 4.7. Thanks to both of you for your work on this…. > > > FYI, I just hit this again with the "Always lock state exclusively" patch too. > I hate when that happens. But it is much harder to hit now. > > the trace is also in the nfs4_get_vfs_file() that's called directly from > nfsd4_process_open2(). > > Otherwise the symptoms are pretty same - first I get the warning in set_access > that the flag is already set and then the nfsd4_free_file_rcu() one and then > unmount of underlying fs fails. > > What's strange is I am not sure what else can set the flag. > Basically set_access is called from nfs4_get_vfs_file() - under the mutex via > nfsd4_process_open2() directly or via nfs4_upgrade_open()..., > or from get_lock_access() - without the mutex, but my workload does not do any > file locking, so it should not really be hitting, right? > > > Ah! I think I see it. > This patch has the same problem as the spinlock moving one: > When we call nfs4_get_vfs_file() directly from nfsd4_process_open2(), there's no > check for anything, so supposed we take this diret cllign path, there's a > mutex_lock() just before the call, but what's to stop another thread from finding > this stateid meanwhile and being first with the mutex and then also settign the > access mode that our first thread no longer sets? > This makes it so that in fact we can never skip the access mode testing unless > testing and setting is atomically done under the same lock which it is not now. > The patch that extended the coverage of the fi_lock got that right, I did not hit > any leaks there, just the "unhandled EAGAIN" warn on, which is wrong in it's own right. > The surprising part is that the state that's not yet been through find_or_hash_clnt_odstate could still be found by somebody else? Is it really > supposed to work like that? > > So if we are to check the access mode at all times in nfs4_get_vfs_file(), > and return eagain, should we just check for that (all under the same > lock if we go with the mutex patch) and call into nfs4_upgrade_open in that > case again? > Or I guess it's even better if we resurrect the fi_lock coverage extension patch and > do it there, as that would mean at least the check and set are atomic wrt > locking? Good catch. Could we fix this by locking the mutex before hashing the new stateid (and having init_open_stateid return with it locked if it finds an existing one?). -- Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file 2016-06-12 1:33 ` Jeff Layton @ 2016-06-12 2:06 ` Oleg Drokin 2016-06-12 2:50 ` Jeff Layton 0 siblings, 1 reply; 45+ messages in thread From: Oleg Drokin @ 2016-06-12 2:06 UTC (permalink / raw) To: Jeff Layton; +Cc: J . Bruce Fields, linux-nfs, linux-kernel, Andrew W Elble On Jun 11, 2016, at 9:33 PM, Jeff Layton wrote: > On Sat, 2016-06-11 at 11:41 -0400, Oleg Drokin wrote: >> On Jun 10, 2016, at 4:55 PM, J . Bruce Fields wrote: >> >>> On Fri, Jun 10, 2016 at 06:50:33AM -0400, Jeff Layton wrote: >>>> On Fri, 2016-06-10 at 00:18 -0400, Oleg Drokin wrote: >>>>> On Jun 9, 2016, at 5:01 PM, Oleg Drokin wrote: >>>>> >>>>>> Currently there's an unprotected access mode check in >>>>>> nfs4_upgrade_open >>>>>> that then calls nfs4_get_vfs_file which in turn assumes whatever >>>>>> access mode was present in the state is still valid which is racy. >>>>>> Two nfs4_get_vfs_file van enter the same path as result and get two >>>>>> references to nfs4_file, but later drop would only happens once >>>>>> because >>>>>> access mode is only denoted by bits, so no refcounting. >>>>>> >>>>>> The locking around access mode testing is introduced to avoid this >>>>>> race. >>>>>> >>>>>> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> >>>>>> --- >>>>>> >>>>>> This patch performs equally well to the st_rwsem -> mutex >>>>>> conversion, >>>>>> but is a bit ligher-weight I imagine. >>>>>> For one it seems to allow truncates in parallel if we ever want it. >>>>>> >>>>>> fs/nfsd/nfs4state.c | 28 +++++++++++++++++++++++++--- >>>>>> 1 file changed, 25 insertions(+), 3 deletions(-) >>>>>> >>>>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >>>>>> index f5f82e1..d4b9eba 100644 >>>>>> --- a/fs/nfsd/nfs4state.c >>>>>> +++ b/fs/nfsd/nfs4state.c >>>>>> @@ -3958,6 +3958,11 @@ static __be32 nfs4_get_vfs_file(struct >>>>>> svc_rqst *rqstp, struct nfs4_file *fp, >>>>>> >>>>>> spin_lock(&fp->fi_lock); >>>>>> >>>>>> + if (test_access(open->op_share_access, stp)) { >>>>>> + spin_unlock(&fp->fi_lock); >>>>>> + return nfserr_eagain; >>>>>> + } >>>>>> + >>>>>> /* >>>>>> * Are we trying to set a deny mode that would conflict with >>>>>> * current access? >>>>>> @@ -4017,11 +4022,21 @@ nfs4_upgrade_open(struct svc_rqst *rqstp, >>>>>> struct nfs4_file *fp, struct svc_fh *c >>>>>> __be32 status; >>>>>> unsigned char old_deny_bmap = stp->st_deny_bmap; >>>>>> >>>>>> - if (!test_access(open->op_share_access, stp)) >>>>>> - return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, >>>>>> open); >>>>>> +again: >>>>>> + spin_lock(&fp->fi_lock); >>>>>> + if (!test_access(open->op_share_access, stp)) { >>>>>> + spin_unlock(&fp->fi_lock); >>>>>> + status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, >>>>>> open); >>>>>> + /* >>>>>> + * Somebody won the race for access while we did >>>>>> not hold >>>>>> + * the lock here >>>>>> + */ >>>>>> + if (status == nfserr_eagain) >>>>>> + goto again; >>>>>> + return status; >>>>>> + } >>>>>> >>>>>> /* test and set deny mode */ >>>>>> - spin_lock(&fp->fi_lock); >>>>>> status = nfs4_file_check_deny(fp, open->op_share_deny); >>>>>> if (status == nfs_ok) { >>>>>> set_deny(open->op_share_deny, stp); >>>>>> @@ -4361,6 +4376,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp, >>>>>> struct svc_fh *current_fh, struct nf >>>>>> status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, >>>>>> open); >>>>>> if (status) { >>>>>> up_read(&stp->st_rwsem); >>>>>> + /* >>>>>> + * EAGAIN is returned when there's a >>>>>> racing access, >>>>>> + * this should never happen as we are the >>>>>> only user >>>>>> + * of this new state, and since it's not >>>>>> yet hashed, >>>>>> + * nobody can find it >>>>>> + */ >>>>>> + WARN_ON(status == nfserr_eagain); >>>>> >>>>> Ok, some more testing shows that this CAN happen. >>>>> So this patch is inferior to the mutex one after all. >>>>> >>>> >>>> Yeah, that can happen for all sorts of reasons. As Andrew pointed out, >>>> you can get this when there is a lease break in progress, and that may >>>> be occurring for a completely different stateid (or because of samba, >>>> etc...) >>>> >>>> It may be possible to do something like this, but we'd need to audit >>>> all of the handling of st_access_bmap (and the deny bmap) to ensure >>>> that we get it right. >>>> >>>> For now, I think just turning that rwsem into a mutex is the best >>>> solution. That is a per-stateid mutex so any contention is going to be >>>> due to the client sending racing OPEN calls for the same inode anyway. >>>> Allowing those to run in parallel again could be useful in some cases, >>>> but most use-cases won't be harmed by that serialization. >>> >>> OK, so for now my plan is to take "nfsd: Always lock state exclusively" >>> for 4.7. Thanks to both of you for your work on this…. >> >> >> FYI, I just hit this again with the "Always lock state exclusively" patch too. >> I hate when that happens. But it is much harder to hit now. >> >> the trace is also in the nfs4_get_vfs_file() that's called directly from >> nfsd4_process_open2(). >> >> Otherwise the symptoms are pretty same - first I get the warning in set_access >> that the flag is already set and then the nfsd4_free_file_rcu() one and then >> unmount of underlying fs fails. >> >> What's strange is I am not sure what else can set the flag. >> Basically set_access is called from nfs4_get_vfs_file() - under the mutex via >> nfsd4_process_open2() directly or via nfs4_upgrade_open()..., >> or from get_lock_access() - without the mutex, but my workload does not do any >> file locking, so it should not really be hitting, right? >> >> >> Ah! I think I see it. >> This patch has the same problem as the spinlock moving one: >> When we call nfs4_get_vfs_file() directly from nfsd4_process_open2(), there's no >> check for anything, so supposed we take this diret cllign path, there's a >> mutex_lock() just before the call, but what's to stop another thread from finding >> this stateid meanwhile and being first with the mutex and then also settign the >> access mode that our first thread no longer sets? >> This makes it so that in fact we can never skip the access mode testing unless >> testing and setting is atomically done under the same lock which it is not now. >> The patch that extended the coverage of the fi_lock got that right, I did not hit >> any leaks there, just the "unhandled EAGAIN" warn on, which is wrong in it's own right. >> The surprising part is that the state that's not yet been through find_or_hash_clnt_odstate could still be found by somebody else? Is it really >> supposed to work like that? >> >> So if we are to check the access mode at all times in nfs4_get_vfs_file(), >> and return eagain, should we just check for that (all under the same >> lock if we go with the mutex patch) and call into nfs4_upgrade_open in that >> case again? >> Or I guess it's even better if we resurrect the fi_lock coverage extension patch and >> do it there, as that would mean at least the check and set are atomic wrt >> locking? > > Good catch. Could we fix this by locking the mutex before hashing the > new stateid (and having init_open_stateid return with it locked if it > finds an existing one?). Hm. I am trying to lock the newly initialized one and that seems to be holding up well (but I want 24 hours just to be extra sure). Hn, I just noticed a bug in this, so that'll reset the clock back. But I think we cannot return with locked one if we found existing one due to lock inversion? I see that normally first we lock the state rwsem (now mutex) and then lock the fi_lock. Now if we make init_open_stateid() to lock the new state mutex while the fi_lock is locked - that's probably ok, because we can do it before adding it to the list, so nobody can find it. Now the existing state that we find, we cannot really lock while holding that fi_lock, because what if there's a parallel thread that already holds the mutex and now wants the fi_lock? And so it's probably best to return with existing state unlocked and let caller lock it? Or do you think it's best to separately lock the found stp outside of spinlock just for consistency? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file 2016-06-12 2:06 ` Oleg Drokin @ 2016-06-12 2:50 ` Jeff Layton 2016-06-12 3:15 ` Oleg Drokin 0 siblings, 1 reply; 45+ messages in thread From: Jeff Layton @ 2016-06-12 2:50 UTC (permalink / raw) To: Oleg Drokin; +Cc: J . Bruce Fields, linux-nfs, linux-kernel, Andrew W Elble On Sat, 2016-06-11 at 22:06 -0400, Oleg Drokin wrote: > On Jun 11, 2016, at 9:33 PM, Jeff Layton wrote: > > > On Sat, 2016-06-11 at 11:41 -0400, Oleg Drokin wrote: > > > On Jun 10, 2016, at 4:55 PM, J . Bruce Fields wrote: > > > > > > > On Fri, Jun 10, 2016 at 06:50:33AM -0400, Jeff Layton wrote: > > > > > On Fri, 2016-06-10 at 00:18 -0400, Oleg Drokin wrote: > > > > > > On Jun 9, 2016, at 5:01 PM, Oleg Drokin wrote: > > > > > > > > > > > > > Currently there's an unprotected access mode check in > > > > > > > nfs4_upgrade_open > > > > > > > that then calls nfs4_get_vfs_file which in turn assumes whatever > > > > > > > access mode was present in the state is still valid which is racy. > > > > > > > Two nfs4_get_vfs_file van enter the same path as result and get two > > > > > > > references to nfs4_file, but later drop would only happens once > > > > > > > because > > > > > > > access mode is only denoted by bits, so no refcounting. > > > > > > > > > > > > > > The locking around access mode testing is introduced to avoid this > > > > > > > race. > > > > > > > > > > > > > > Signed-off-by: Oleg Drokin <green@linuxhacker.ru> > > > > > > > --- > > > > > > > > > > > > > > This patch performs equally well to the st_rwsem -> mutex > > > > > > > conversion, > > > > > > > but is a bit ligher-weight I imagine. > > > > > > > For one it seems to allow truncates in parallel if we ever want it. > > > > > > > > > > > > > > fs/nfsd/nfs4state.c | 28 +++++++++++++++++++++++++--- > > > > > > > 1 file changed, 25 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > > > > > > index f5f82e1..d4b9eba 100644 > > > > > > > --- a/fs/nfsd/nfs4state.c > > > > > > > +++ b/fs/nfsd/nfs4state.c > > > > > > > @@ -3958,6 +3958,11 @@ static __be32 nfs4_get_vfs_file(struct > > > > > > > svc_rqst *rqstp, struct nfs4_file *fp, > > > > > > > > > > > > > > spin_lock(&fp->fi_lock); > > > > > > > > > > > > > > + if (test_access(open->op_share_access, stp)) { > > > > > > > + spin_unlock(&fp->fi_lock); > > > > > > > + return nfserr_eagain; > > > > > > > + } > > > > > > > + > > > > > > > /* > > > > > > > * Are we trying to set a deny mode that would conflict with > > > > > > > * current access? > > > > > > > @@ -4017,11 +4022,21 @@ nfs4_upgrade_open(struct svc_rqst *rqstp, > > > > > > > struct nfs4_file *fp, struct svc_fh *c > > > > > > > __be32 status; > > > > > > > unsigned char old_deny_bmap = stp->st_deny_bmap; > > > > > > > > > > > > > > - if (!test_access(open->op_share_access, stp)) > > > > > > > - return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, > > > > > > > open); > > > > > > > +again: > > > > > > > + spin_lock(&fp->fi_lock); > > > > > > > + if (!test_access(open->op_share_access, stp)) { > > > > > > > + spin_unlock(&fp->fi_lock); > > > > > > > + status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, > > > > > > > open); > > > > > > > + /* > > > > > > > + * Somebody won the race for access while we did > > > > > > > not hold > > > > > > > + * the lock here > > > > > > > + */ > > > > > > > + if (status == nfserr_eagain) > > > > > > > + goto again; > > > > > > > + return status; > > > > > > > + } > > > > > > > > > > > > > > /* test and set deny mode */ > > > > > > > - spin_lock(&fp->fi_lock); > > > > > > > status = nfs4_file_check_deny(fp, open->op_share_deny); > > > > > > > if (status == nfs_ok) { > > > > > > > set_deny(open->op_share_deny, stp); > > > > > > > @@ -4361,6 +4376,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp, > > > > > > > struct svc_fh *current_fh, struct nf > > > > > > > status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, > > > > > > > open); > > > > > > > if (status) { > > > > > > > up_read(&stp->st_rwsem); > > > > > > > + /* > > > > > > > + * EAGAIN is returned when there's a > > > > > > > racing access, > > > > > > > + * this should never happen as we are the > > > > > > > only user > > > > > > > + * of this new state, and since it's not > > > > > > > yet hashed, > > > > > > > + * nobody can find it > > > > > > > + */ > > > > > > > + WARN_ON(status == nfserr_eagain); > > > > > > > > > > > > Ok, some more testing shows that this CAN happen. > > > > > > So this patch is inferior to the mutex one after all. > > > > > > > > > > > > > > > > Yeah, that can happen for all sorts of reasons. As Andrew pointed out, > > > > > you can get this when there is a lease break in progress, and that may > > > > > be occurring for a completely different stateid (or because of samba, > > > > > etc...) > > > > > > > > > > It may be possible to do something like this, but we'd need to audit > > > > > all of the handling of st_access_bmap (and the deny bmap) to ensure > > > > > that we get it right. > > > > > > > > > > For now, I think just turning that rwsem into a mutex is the best > > > > > solution. That is a per-stateid mutex so any contention is going to be > > > > > due to the client sending racing OPEN calls for the same inode anyway. > > > > > Allowing those to run in parallel again could be useful in some cases, > > > > > but most use-cases won't be harmed by that serialization. > > > > > > > > OK, so for now my plan is to take "nfsd: Always lock state exclusively" > > > > for 4.7. Thanks to both of you for your work on this…. > > > > > > > > > FYI, I just hit this again with the "Always lock state exclusively" patch too. > > > I hate when that happens. But it is much harder to hit now. > > > > > > the trace is also in the nfs4_get_vfs_file() that's called directly from > > > nfsd4_process_open2(). > > > > > > Otherwise the symptoms are pretty same - first I get the warning in set_access > > > that the flag is already set and then the nfsd4_free_file_rcu() one and then > > > unmount of underlying fs fails. > > > > > > What's strange is I am not sure what else can set the flag. > > > Basically set_access is called from nfs4_get_vfs_file() - under the mutex via > > > nfsd4_process_open2() directly or via nfs4_upgrade_open()..., > > > or from get_lock_access() - without the mutex, but my workload does not do any > > > file locking, so it should not really be hitting, right? > > > > > > > > > Ah! I think I see it. > > > This patch has the same problem as the spinlock moving one: > > > When we call nfs4_get_vfs_file() directly from nfsd4_process_open2(), there's no > > > check for anything, so supposed we take this diret cllign path, there's a > > > mutex_lock() just before the call, but what's to stop another thread from finding > > > this stateid meanwhile and being first with the mutex and then also settign the > > > access mode that our first thread no longer sets? > > > This makes it so that in fact we can never skip the access mode testing unless > > > testing and setting is atomically done under the same lock which it is not now. > > > The patch that extended the coverage of the fi_lock got that right, I did not hit > > > any leaks there, just the "unhandled EAGAIN" warn on, which is wrong in it's own right. > > > The surprising part is that the state that's not yet been through find_or_hash_clnt_odstate could still be found by somebody else? Is it really > > > supposed to work like that? > > > > > > So if we are to check the access mode at all times in nfs4_get_vfs_file(), > > > and return eagain, should we just check for that (all under the same > > > lock if we go with the mutex patch) and call into nfs4_upgrade_open in that > > > case again? > > > Or I guess it's even better if we resurrect the fi_lock coverage extension patch and > > > do it there, as that would mean at least the check and set are atomic wrt > > > locking? > > > > Good catch. Could we fix this by locking the mutex before hashing the > > new stateid (and having init_open_stateid return with it locked if it > > finds an existing one?). > > Hm. I am trying to lock the newly initialized one and that seems to be holding up > well (but I want 24 hours just to be extra sure). > Hn, I just noticed a bug in this, so that'll reset the clock back. > > But I think we cannot return with locked one if we found existing one due to lock > inversion? > I see that normally first we lock the state rwsem (now mutex) and then > lock the fi_lock. > Now if we make init_open_stateid() to lock the new state mutex while the fi_lock > is locked - that's probably ok, because we can do it before adding it to the list, > so nobody can find it. > Now the existing state that we find, we cannot really lock while holding that fi_lock, > because what if there's a parallel thread that already holds the mutex and now > wants the fi_lock? > And so it's probably best to return with existing state unlocked and let caller lock it? > Or do you think it's best to separately lock the found stp outside of spinlock > just for consistency? I think we just have to ensure that if the new stateid is hashed that its mutex is locked prior to being inserted into the hashtable. That should prevent the race you mentioned. If we find an existing one in the hashtable in init_open_stateid, then we _can_ take the mutex after dropping the spinlocks, since we won't call release_open_stateid in that case anyway. We'll also need to consider what happens if nfs4_get_vfs_file fails after we hashed the stateid, but then another task finds it while processing another open. So we might have to have release_open_stateid unlock the mutex after unhashing the stateid, but before putting the reference, and then have init_open_stateid check to see if the thing is still hashed after it gets the mutex. -- Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file 2016-06-12 2:50 ` Jeff Layton @ 2016-06-12 3:15 ` Oleg Drokin 2016-06-12 13:13 ` Jeff Layton 2016-06-13 1:26 ` [PATCH v2] nfsd: Always lock state exclusively Oleg Drokin 0 siblings, 2 replies; 45+ messages in thread From: Oleg Drokin @ 2016-06-12 3:15 UTC (permalink / raw) To: Jeff Layton; +Cc: J . Bruce Fields, linux-nfs, linux-kernel, Andrew W Elble On Jun 11, 2016, at 10:50 PM, Jeff Layton wrote: > On Sat, 2016-06-11 at 22:06 -0400, Oleg Drokin wrote: >> >> Hm. I am trying to lock the newly initialized one and that seems to be holding up >> well (but I want 24 hours just to be extra sure). >> Hn, I just noticed a bug in this, so that'll reset the clock back. >> >> But I think we cannot return with locked one if we found existing one due to lock >> inversion? >> I see that normally first we lock the state rwsem (now mutex) and then >> lock the fi_lock. >> Now if we make init_open_stateid() to lock the new state mutex while the fi_lock >> is locked - that's probably ok, because we can do it before adding it to the list, >> so nobody can find it. >> Now the existing state that we find, we cannot really lock while holding that fi_lock, >> because what if there's a parallel thread that already holds the mutex and now >> wants the fi_lock? >> And so it's probably best to return with existing state unlocked and let caller lock it? >> Or do you think it's best to separately lock the found stp outside of spinlock >> just for consistency? > > I think we just have to ensure that if the new stateid is hashed that > its mutex is locked prior to being inserted into the hashtable. That > should prevent the race you mentioned. > > If we find an existing one in the hashtable in init_open_stateid, then > we _can_ take the mutex after dropping the spinlocks, since we won't > call release_open_stateid in that case anyway. Yes. > We'll also need to consider what happens if nfs4_get_vfs_file fails > after we hashed the stateid, but then another task finds it while > processing another open. So we might have to have release_open_stateid > unlock the mutex after unhashing the stateid, but before putting the > reference, and then have init_open_stateid check to see if the thing is > still hashed after it gets the mutex. Hm. So what's going to go wrong if another user reuses the unhashed stateid? As long as they drop it once they are done it'll be freed and all is fine, no? Are there other implications? Hm, it looks like free_ol_stateid_reaplist() just frees the thing without any looking into mutexes and stuff? Ok, so we get the mutex, check that the stateid is hashed, it's not anymore (actually unhashing could be done without mutex too, right? so just mutex held is not going to protect us), then we need to drop the mutex and restart the search from scratch (including all relocking), I assume? I guess I'll have it as a separate follow on patch. We'll probably also need some fault-injection here to trigger this case, as triggering it "naturally" will be a tough problem even on my mega-racy setup. Something like: if (swapstp) { … } if (FAULTINJECTION) { msleep(some_random_time); status = nfserr_eio; } else status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open); should increase the chance. Ideally there'd be a way to trigger this case more deterministically, how do I have two OPEN requests in parallel in NFS for the same file, just have two threads do it and that would 100% result in two requests, no merging anywhere along the way that I need to be aware of? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file 2016-06-12 3:15 ` Oleg Drokin @ 2016-06-12 13:13 ` Jeff Layton 2016-06-13 1:26 ` [PATCH v2] nfsd: Always lock state exclusively Oleg Drokin 1 sibling, 0 replies; 45+ messages in thread From: Jeff Layton @ 2016-06-12 13:13 UTC (permalink / raw) To: Oleg Drokin; +Cc: J . Bruce Fields, linux-nfs, linux-kernel, Andrew W Elble On Sat, 2016-06-11 at 23:15 -0400, Oleg Drokin wrote: > On Jun 11, 2016, at 10:50 PM, Jeff Layton wrote: > > > > > On Sat, 2016-06-11 at 22:06 -0400, Oleg Drokin wrote: > > > > > > > > > Hm. I am trying to lock the newly initialized one and that seems to be holding up > > > well (but I want 24 hours just to be extra sure). > > > Hn, I just noticed a bug in this, so that'll reset the clock back. > > > > > > But I think we cannot return with locked one if we found existing one due to lock > > > inversion? > > > I see that normally first we lock the state rwsem (now mutex) and then > > > lock the fi_lock. > > > Now if we make init_open_stateid() to lock the new state mutex while the fi_lock > > > is locked - that's probably ok, because we can do it before adding it to the list, > > > so nobody can find it. > > > Now the existing state that we find, we cannot really lock while holding that fi_lock, > > > because what if there's a parallel thread that already holds the mutex and now > > > wants the fi_lock? > > > And so it's probably best to return with existing state unlocked and let caller lock it? > > > Or do you think it's best to separately lock the found stp outside of spinlock > > > just for consistency? > > I think we just have to ensure that if the new stateid is hashed that > > its mutex is locked prior to being inserted into the hashtable. That > > should prevent the race you mentioned. > > > > If we find an existing one in the hashtable in init_open_stateid, then > > we _can_ take the mutex after dropping the spinlocks, since we won't > > call release_open_stateid in that case anyway. > Yes. > > > > > We'll also need to consider what happens if nfs4_get_vfs_file fails > > after we hashed the stateid, but then another task finds it while > > processing another open. So we might have to have release_open_stateid > > unlock the mutex after unhashing the stateid, but before putting the > > reference, and then have init_open_stateid check to see if the thing is > > still hashed after it gets the mutex. > Hm. > So what's going to go wrong if another user reuses the unhashed stateid? > As long as they drop it once they are done it'll be freed and all is fine, no? > Are there other implications? > Hm, it looks like free_ol_stateid_reaplist() just frees the thing without any looking > into mutexes and stuff? > The problem there is that you're sending a very soon to be defunct stateid to the client as valid, which means the client will end up in state recovery (at best). The initial open is somewhat of a special case...we are hashing a new stateid. If the operation that hashes it fails, then we want to make like it never happened at all. The client will never see it, so we don't want to leave it hanging around on the server. We need to unhash the stateid in that case. So, yes, the mutex doesn't really prevent the thing from being unhashed, but if you take the mutex and the thing is still hashed afterward, then you know that the above situation didn't happen. The initial incarnation of the stateid will have been sent to the client. > Ok, so we get the mutex, check that the stateid is hashed, it's not anymore > (actually unhashing could be done without mutex too, right? so just mutex held > is not going to protect us), then we need to drop the mutex and restart the search > from scratch (including all relocking), I assume? Yeah, that's what I was thinking. It should be a fairly rare race so taking an extra hit on the locking in that case shouldn't be too bad. I think there's quite a bit of opportunity to simplify the open path by reworking this code as well. What we probably need to do is turn nfsd4_find_existing_open into something that intializes and hashes the stateid if it doesn't find one , and returns it with the mutex locked. It could also do the check to see if an existing stateid was still hashed after taking the mutex and could redrive the search if it isn't. That would also make the open code more resilient in the face of OPEN vs. CLOSE races, which would also be nice... > I guess I'll have it as a separate follow on patch. > We'll probably also need some fault-injection here to trigger this case, as triggering > it "naturally" will be a tough problem even on my mega-racy setup. > > Something like: > if (swapstp) { > … > } > if (FAULTINJECTION) { > msleep(some_random_time); > status = nfserr_eio; > } else > status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open); > > should increase the chance. Sure. Look for CONFIG_NFSD_FAULT_INJECTION. You might be able to use that framework (though it is a bit manky). > Ideally there'd be a way to trigger this case more deterministically, > how do I have two OPEN requests in parallel in NFS for the same file, > just have two threads do it and that would 100% result in two requests, > no merging anywhere along the way that I need to be aware of? Yeah, that's basically it. You need two racing OPEN calls for the same stateid. Might be easiest to do with something like pynfs... -- Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH v2] nfsd: Always lock state exclusively. 2016-06-12 3:15 ` Oleg Drokin 2016-06-12 13:13 ` Jeff Layton @ 2016-06-13 1:26 ` Oleg Drokin 2016-06-14 15:38 ` J . Bruce Fields 2016-06-14 15:46 ` [PATCH v2] nfsd: Always lock state exclusively J . Bruce Fields 1 sibling, 2 replies; 45+ messages in thread From: Oleg Drokin @ 2016-06-13 1:26 UTC (permalink / raw) To: Jeff Layton, J . Bruce Fields; +Cc: linux-nfs, linux-kernel, Oleg Drokin It used to be the case that state had an rwlock that was locked for write by downgrades, but for read for upgrades (opens). Well, the problem is if there are two competing opens for the same state, they step on each other toes potentially leading to leaking file descriptors from the state structure, since access mode is a bitmap only set once. Extend the holding region around in nfsd4_process_open2() to avoid racing entry into nfs4_get_vfs_file(). Make init_open_stateid() return with locked stateid to be unlocked by the caller. Now this version held up pretty well in my testing for 24 hours. It still does not address the situation if during one of the racing nfs4_get_vfs_file() calls we are getting an error from one (first?) of them. This is to be addressed in a separate patch after having a solid reproducer (potentially using some fault injection). Signed-off-by: Oleg Drokin <green@linuxhacker.ru> --- fs/nfsd/nfs4state.c | 47 +++++++++++++++++++++++++++-------------------- fs/nfsd/state.h | 2 +- 2 files changed, 28 insertions(+), 21 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index f5f82e1..fa5fb5a 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3487,6 +3487,10 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, struct nfs4_openowner *oo = open->op_openowner; struct nfs4_ol_stateid *retstp = NULL; + /* We are moving these outside of the spinlocks to avoid the warnings */ + mutex_init(&stp->st_mutex); + mutex_lock(&stp->st_mutex); + spin_lock(&oo->oo_owner.so_client->cl_lock); spin_lock(&fp->fi_lock); @@ -3502,13 +3506,14 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, stp->st_access_bmap = 0; stp->st_deny_bmap = 0; stp->st_openstp = NULL; - init_rwsem(&stp->st_rwsem); list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids); list_add(&stp->st_perfile, &fp->fi_stateids); out_unlock: spin_unlock(&fp->fi_lock); spin_unlock(&oo->oo_owner.so_client->cl_lock); + if (retstp) + mutex_lock(&retstp->st_mutex); return retstp; } @@ -4335,32 +4340,34 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf */ if (stp) { /* Stateid was found, this is an OPEN upgrade */ - down_read(&stp->st_rwsem); + mutex_lock(&stp->st_mutex); status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open); if (status) { - up_read(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); goto out; } } else { stp = open->op_stp; open->op_stp = NULL; + /* + * init_open_stateid() either returns a locked stateid + * it found, or initializes and locks the new one we passed in + */ swapstp = init_open_stateid(stp, fp, open); if (swapstp) { nfs4_put_stid(&stp->st_stid); stp = swapstp; - down_read(&stp->st_rwsem); status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open); if (status) { - up_read(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); goto out; } goto upgrade_out; } - down_read(&stp->st_rwsem); status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open); if (status) { - up_read(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); release_open_stateid(stp); goto out; } @@ -4372,7 +4379,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf } upgrade_out: nfs4_inc_and_copy_stateid(&open->op_stateid, &stp->st_stid); - up_read(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); if (nfsd4_has_session(&resp->cstate)) { if (open->op_deleg_want & NFS4_SHARE_WANT_NO_DELEG) { @@ -4977,12 +4984,12 @@ static __be32 nfs4_seqid_op_checks(struct nfsd4_compound_state *cstate, stateid_ * revoked delegations are kept only for free_stateid. */ return nfserr_bad_stateid; - down_write(&stp->st_rwsem); + mutex_lock(&stp->st_mutex); status = check_stateid_generation(stateid, &stp->st_stid.sc_stateid, nfsd4_has_session(cstate)); if (status == nfs_ok) status = nfs4_check_fh(current_fh, &stp->st_stid); if (status != nfs_ok) - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); return status; } @@ -5030,7 +5037,7 @@ static __be32 nfs4_preprocess_confirmed_seqid_op(struct nfsd4_compound_state *cs return status; oo = openowner(stp->st_stateowner); if (!(oo->oo_flags & NFS4_OO_CONFIRMED)) { - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); nfs4_put_stid(&stp->st_stid); return nfserr_bad_stateid; } @@ -5062,12 +5069,12 @@ nfsd4_open_confirm(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, oo = openowner(stp->st_stateowner); status = nfserr_bad_stateid; if (oo->oo_flags & NFS4_OO_CONFIRMED) { - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); goto put_stateid; } oo->oo_flags |= NFS4_OO_CONFIRMED; nfs4_inc_and_copy_stateid(&oc->oc_resp_stateid, &stp->st_stid); - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); dprintk("NFSD: %s: success, seqid=%d stateid=" STATEID_FMT "\n", __func__, oc->oc_seqid, STATEID_VAL(&stp->st_stid.sc_stateid)); @@ -5143,7 +5150,7 @@ nfsd4_open_downgrade(struct svc_rqst *rqstp, nfs4_inc_and_copy_stateid(&od->od_stateid, &stp->st_stid); status = nfs_ok; put_stateid: - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); nfs4_put_stid(&stp->st_stid); out: nfsd4_bump_seqid(cstate, status); @@ -5196,7 +5203,7 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) goto out; nfs4_inc_and_copy_stateid(&close->cl_stateid, &stp->st_stid); - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); nfsd4_close_open_stateid(stp); @@ -5422,7 +5429,7 @@ init_lock_stateid(struct nfs4_ol_stateid *stp, struct nfs4_lockowner *lo, stp->st_access_bmap = 0; stp->st_deny_bmap = open_stp->st_deny_bmap; stp->st_openstp = open_stp; - init_rwsem(&stp->st_rwsem); + mutex_init(&stp->st_mutex); list_add(&stp->st_locks, &open_stp->st_locks); list_add(&stp->st_perstateowner, &lo->lo_owner.so_stateids); spin_lock(&fp->fi_lock); @@ -5591,7 +5598,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, &open_stp, nn); if (status) goto out; - up_write(&open_stp->st_rwsem); + mutex_unlock(&open_stp->st_mutex); open_sop = openowner(open_stp->st_stateowner); status = nfserr_bad_stateid; if (!same_clid(&open_sop->oo_owner.so_client->cl_clientid, @@ -5600,7 +5607,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = lookup_or_create_lock_state(cstate, open_stp, lock, &lock_stp, &new); if (status == nfs_ok) - down_write(&lock_stp->st_rwsem); + mutex_lock(&lock_stp->st_mutex); } else { status = nfs4_preprocess_seqid_op(cstate, lock->lk_old_lock_seqid, @@ -5704,7 +5711,7 @@ out: seqid_mutating_err(ntohl(status))) lock_sop->lo_owner.so_seqid++; - up_write(&lock_stp->st_rwsem); + mutex_unlock(&lock_stp->st_mutex); /* * If this is a new, never-before-used stateid, and we are @@ -5874,7 +5881,7 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, fput: fput(filp); put_stateid: - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); nfs4_put_stid(&stp->st_stid); out: nfsd4_bump_seqid(cstate, status); diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 986e51e..64053ea 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -535,7 +535,7 @@ struct nfs4_ol_stateid { unsigned char st_access_bmap; unsigned char st_deny_bmap; struct nfs4_ol_stateid *st_openstp; - struct rw_semaphore st_rwsem; + struct mutex st_mutex; }; static inline struct nfs4_ol_stateid *openlockstateid(struct nfs4_stid *s) -- 2.7.4 ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH v2] nfsd: Always lock state exclusively. 2016-06-13 1:26 ` [PATCH v2] nfsd: Always lock state exclusively Oleg Drokin @ 2016-06-14 15:38 ` J . Bruce Fields 2016-06-14 15:53 ` Oleg Drokin 2016-06-14 15:46 ` [PATCH v2] nfsd: Always lock state exclusively J . Bruce Fields 1 sibling, 1 reply; 45+ messages in thread From: J . Bruce Fields @ 2016-06-14 15:38 UTC (permalink / raw) To: Oleg Drokin; +Cc: Jeff Layton, linux-nfs, linux-kernel On Sun, Jun 12, 2016 at 09:26:27PM -0400, Oleg Drokin wrote: > It used to be the case that state had an rwlock that was locked for write > by downgrades, but for read for upgrades (opens). Well, the problem is > if there are two competing opens for the same state, they step on > each other toes potentially leading to leaking file descriptors > from the state structure, since access mode is a bitmap only set once. > > Extend the holding region around in nfsd4_process_open2() to avoid > racing entry into nfs4_get_vfs_file(). > Make init_open_stateid() return with locked stateid to be unlocked > by the caller. > > Now this version held up pretty well in my testing for 24 hours. > It still does not address the situation if during one of the racing > nfs4_get_vfs_file() calls we are getting an error from one (first?) > of them. This is to be addressed in a separate patch after having a > solid reproducer (potentially using some fault injection). > > Signed-off-by: Oleg Drokin <green@linuxhacker.ru> > --- > fs/nfsd/nfs4state.c | 47 +++++++++++++++++++++++++++-------------------- > fs/nfsd/state.h | 2 +- > 2 files changed, 28 insertions(+), 21 deletions(-) > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index f5f82e1..fa5fb5a 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -3487,6 +3487,10 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, > struct nfs4_openowner *oo = open->op_openowner; > struct nfs4_ol_stateid *retstp = NULL; > > + /* We are moving these outside of the spinlocks to avoid the warnings */ > + mutex_init(&stp->st_mutex); > + mutex_lock(&stp->st_mutex); > + > spin_lock(&oo->oo_owner.so_client->cl_lock); > spin_lock(&fp->fi_lock); > > @@ -3502,13 +3506,14 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, > stp->st_access_bmap = 0; > stp->st_deny_bmap = 0; > stp->st_openstp = NULL; > - init_rwsem(&stp->st_rwsem); > list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids); > list_add(&stp->st_perfile, &fp->fi_stateids); > > out_unlock: > spin_unlock(&fp->fi_lock); > spin_unlock(&oo->oo_owner.so_client->cl_lock); > + if (retstp) > + mutex_lock(&retstp->st_mutex); > return retstp; You're returning with both stp->st_mutex and retstp->st_mutex locked. Did you mean to drop that first lock in the (retstp) case, or am I missing something? --b. > } > > @@ -4335,32 +4340,34 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf > */ > if (stp) { > /* Stateid was found, this is an OPEN upgrade */ > - down_read(&stp->st_rwsem); > + mutex_lock(&stp->st_mutex); > status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open); > if (status) { > - up_read(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > goto out; > } > } else { > stp = open->op_stp; > open->op_stp = NULL; > + /* > + * init_open_stateid() either returns a locked stateid > + * it found, or initializes and locks the new one we passed in > + */ > swapstp = init_open_stateid(stp, fp, open); > if (swapstp) { > nfs4_put_stid(&stp->st_stid); > stp = swapstp; > - down_read(&stp->st_rwsem); > status = nfs4_upgrade_open(rqstp, fp, current_fh, > stp, open); > if (status) { > - up_read(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > goto out; > } > goto upgrade_out; > } > - down_read(&stp->st_rwsem); > status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open); > if (status) { > - up_read(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > release_open_stateid(stp); > goto out; > } > @@ -4372,7 +4379,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf > } > upgrade_out: > nfs4_inc_and_copy_stateid(&open->op_stateid, &stp->st_stid); > - up_read(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > > if (nfsd4_has_session(&resp->cstate)) { > if (open->op_deleg_want & NFS4_SHARE_WANT_NO_DELEG) { > @@ -4977,12 +4984,12 @@ static __be32 nfs4_seqid_op_checks(struct nfsd4_compound_state *cstate, stateid_ > * revoked delegations are kept only for free_stateid. > */ > return nfserr_bad_stateid; > - down_write(&stp->st_rwsem); > + mutex_lock(&stp->st_mutex); > status = check_stateid_generation(stateid, &stp->st_stid.sc_stateid, nfsd4_has_session(cstate)); > if (status == nfs_ok) > status = nfs4_check_fh(current_fh, &stp->st_stid); > if (status != nfs_ok) > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > return status; > } > > @@ -5030,7 +5037,7 @@ static __be32 nfs4_preprocess_confirmed_seqid_op(struct nfsd4_compound_state *cs > return status; > oo = openowner(stp->st_stateowner); > if (!(oo->oo_flags & NFS4_OO_CONFIRMED)) { > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > nfs4_put_stid(&stp->st_stid); > return nfserr_bad_stateid; > } > @@ -5062,12 +5069,12 @@ nfsd4_open_confirm(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > oo = openowner(stp->st_stateowner); > status = nfserr_bad_stateid; > if (oo->oo_flags & NFS4_OO_CONFIRMED) { > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > goto put_stateid; > } > oo->oo_flags |= NFS4_OO_CONFIRMED; > nfs4_inc_and_copy_stateid(&oc->oc_resp_stateid, &stp->st_stid); > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > dprintk("NFSD: %s: success, seqid=%d stateid=" STATEID_FMT "\n", > __func__, oc->oc_seqid, STATEID_VAL(&stp->st_stid.sc_stateid)); > > @@ -5143,7 +5150,7 @@ nfsd4_open_downgrade(struct svc_rqst *rqstp, > nfs4_inc_and_copy_stateid(&od->od_stateid, &stp->st_stid); > status = nfs_ok; > put_stateid: > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > nfs4_put_stid(&stp->st_stid); > out: > nfsd4_bump_seqid(cstate, status); > @@ -5196,7 +5203,7 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > if (status) > goto out; > nfs4_inc_and_copy_stateid(&close->cl_stateid, &stp->st_stid); > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > > nfsd4_close_open_stateid(stp); > > @@ -5422,7 +5429,7 @@ init_lock_stateid(struct nfs4_ol_stateid *stp, struct nfs4_lockowner *lo, > stp->st_access_bmap = 0; > stp->st_deny_bmap = open_stp->st_deny_bmap; > stp->st_openstp = open_stp; > - init_rwsem(&stp->st_rwsem); > + mutex_init(&stp->st_mutex); > list_add(&stp->st_locks, &open_stp->st_locks); > list_add(&stp->st_perstateowner, &lo->lo_owner.so_stateids); > spin_lock(&fp->fi_lock); > @@ -5591,7 +5598,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > &open_stp, nn); > if (status) > goto out; > - up_write(&open_stp->st_rwsem); > + mutex_unlock(&open_stp->st_mutex); > open_sop = openowner(open_stp->st_stateowner); > status = nfserr_bad_stateid; > if (!same_clid(&open_sop->oo_owner.so_client->cl_clientid, > @@ -5600,7 +5607,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > status = lookup_or_create_lock_state(cstate, open_stp, lock, > &lock_stp, &new); > if (status == nfs_ok) > - down_write(&lock_stp->st_rwsem); > + mutex_lock(&lock_stp->st_mutex); > } else { > status = nfs4_preprocess_seqid_op(cstate, > lock->lk_old_lock_seqid, > @@ -5704,7 +5711,7 @@ out: > seqid_mutating_err(ntohl(status))) > lock_sop->lo_owner.so_seqid++; > > - up_write(&lock_stp->st_rwsem); > + mutex_unlock(&lock_stp->st_mutex); > > /* > * If this is a new, never-before-used stateid, and we are > @@ -5874,7 +5881,7 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > fput: > fput(filp); > put_stateid: > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > nfs4_put_stid(&stp->st_stid); > out: > nfsd4_bump_seqid(cstate, status); > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h > index 986e51e..64053ea 100644 > --- a/fs/nfsd/state.h > +++ b/fs/nfsd/state.h > @@ -535,7 +535,7 @@ struct nfs4_ol_stateid { > unsigned char st_access_bmap; > unsigned char st_deny_bmap; > struct nfs4_ol_stateid *st_openstp; > - struct rw_semaphore st_rwsem; > + struct mutex st_mutex; > }; > > static inline struct nfs4_ol_stateid *openlockstateid(struct nfs4_stid *s) > -- > 2.7.4 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2] nfsd: Always lock state exclusively. 2016-06-14 15:38 ` J . Bruce Fields @ 2016-06-14 15:53 ` Oleg Drokin 2016-06-14 18:50 ` J . Bruce Fields 0 siblings, 1 reply; 45+ messages in thread From: Oleg Drokin @ 2016-06-14 15:53 UTC (permalink / raw) To: J . Bruce Fields; +Cc: Jeff Layton, linux-nfs, linux-kernel On Jun 14, 2016, at 11:38 AM, J . Bruce Fields wrote: > On Sun, Jun 12, 2016 at 09:26:27PM -0400, Oleg Drokin wrote: >> It used to be the case that state had an rwlock that was locked for write >> by downgrades, but for read for upgrades (opens). Well, the problem is >> if there are two competing opens for the same state, they step on >> each other toes potentially leading to leaking file descriptors >> from the state structure, since access mode is a bitmap only set once. >> >> Extend the holding region around in nfsd4_process_open2() to avoid >> racing entry into nfs4_get_vfs_file(). >> Make init_open_stateid() return with locked stateid to be unlocked >> by the caller. >> >> Now this version held up pretty well in my testing for 24 hours. >> It still does not address the situation if during one of the racing >> nfs4_get_vfs_file() calls we are getting an error from one (first?) >> of them. This is to be addressed in a separate patch after having a >> solid reproducer (potentially using some fault injection). >> >> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> >> --- >> fs/nfsd/nfs4state.c | 47 +++++++++++++++++++++++++++-------------------- >> fs/nfsd/state.h | 2 +- >> 2 files changed, 28 insertions(+), 21 deletions(-) >> >> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >> index f5f82e1..fa5fb5a 100644 >> --- a/fs/nfsd/nfs4state.c >> +++ b/fs/nfsd/nfs4state.c >> @@ -3487,6 +3487,10 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, >> struct nfs4_openowner *oo = open->op_openowner; >> struct nfs4_ol_stateid *retstp = NULL; >> >> + /* We are moving these outside of the spinlocks to avoid the warnings */ >> + mutex_init(&stp->st_mutex); >> + mutex_lock(&stp->st_mutex); >> + >> spin_lock(&oo->oo_owner.so_client->cl_lock); >> spin_lock(&fp->fi_lock); >> >> @@ -3502,13 +3506,14 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, >> stp->st_access_bmap = 0; >> stp->st_deny_bmap = 0; >> stp->st_openstp = NULL; >> - init_rwsem(&stp->st_rwsem); >> list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids); >> list_add(&stp->st_perfile, &fp->fi_stateids); >> >> out_unlock: >> spin_unlock(&fp->fi_lock); >> spin_unlock(&oo->oo_owner.so_client->cl_lock); >> + if (retstp) >> + mutex_lock(&retstp->st_mutex); >> return retstp; > > You're returning with both stp->st_mutex and retstp->st_mutex locked. > Did you mean to drop that first lock in the (retstp) case, or am I > missing something? Well, I think it's ok (perhaps worthy of a comment) it's that if we matched a different retstp state, then stp is not used and either released right away or even if reused, it would be reinitialized in another call to init_open_stateid(), so it's fine? > > --b. > >> } >> >> @@ -4335,32 +4340,34 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf >> */ >> if (stp) { >> /* Stateid was found, this is an OPEN upgrade */ >> - down_read(&stp->st_rwsem); >> + mutex_lock(&stp->st_mutex); >> status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open); >> if (status) { >> - up_read(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> goto out; >> } >> } else { >> stp = open->op_stp; >> open->op_stp = NULL; >> + /* >> + * init_open_stateid() either returns a locked stateid >> + * it found, or initializes and locks the new one we passed in >> + */ >> swapstp = init_open_stateid(stp, fp, open); >> if (swapstp) { >> nfs4_put_stid(&stp->st_stid); >> stp = swapstp; >> - down_read(&stp->st_rwsem); >> status = nfs4_upgrade_open(rqstp, fp, current_fh, >> stp, open); >> if (status) { >> - up_read(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> goto out; >> } >> goto upgrade_out; >> } >> - down_read(&stp->st_rwsem); >> status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open); >> if (status) { >> - up_read(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> release_open_stateid(stp); >> goto out; >> } >> @@ -4372,7 +4379,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf >> } >> upgrade_out: >> nfs4_inc_and_copy_stateid(&open->op_stateid, &stp->st_stid); >> - up_read(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> >> if (nfsd4_has_session(&resp->cstate)) { >> if (open->op_deleg_want & NFS4_SHARE_WANT_NO_DELEG) { >> @@ -4977,12 +4984,12 @@ static __be32 nfs4_seqid_op_checks(struct nfsd4_compound_state *cstate, stateid_ >> * revoked delegations are kept only for free_stateid. >> */ >> return nfserr_bad_stateid; >> - down_write(&stp->st_rwsem); >> + mutex_lock(&stp->st_mutex); >> status = check_stateid_generation(stateid, &stp->st_stid.sc_stateid, nfsd4_has_session(cstate)); >> if (status == nfs_ok) >> status = nfs4_check_fh(current_fh, &stp->st_stid); >> if (status != nfs_ok) >> - up_write(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> return status; >> } >> >> @@ -5030,7 +5037,7 @@ static __be32 nfs4_preprocess_confirmed_seqid_op(struct nfsd4_compound_state *cs >> return status; >> oo = openowner(stp->st_stateowner); >> if (!(oo->oo_flags & NFS4_OO_CONFIRMED)) { >> - up_write(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> nfs4_put_stid(&stp->st_stid); >> return nfserr_bad_stateid; >> } >> @@ -5062,12 +5069,12 @@ nfsd4_open_confirm(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, >> oo = openowner(stp->st_stateowner); >> status = nfserr_bad_stateid; >> if (oo->oo_flags & NFS4_OO_CONFIRMED) { >> - up_write(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> goto put_stateid; >> } >> oo->oo_flags |= NFS4_OO_CONFIRMED; >> nfs4_inc_and_copy_stateid(&oc->oc_resp_stateid, &stp->st_stid); >> - up_write(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> dprintk("NFSD: %s: success, seqid=%d stateid=" STATEID_FMT "\n", >> __func__, oc->oc_seqid, STATEID_VAL(&stp->st_stid.sc_stateid)); >> >> @@ -5143,7 +5150,7 @@ nfsd4_open_downgrade(struct svc_rqst *rqstp, >> nfs4_inc_and_copy_stateid(&od->od_stateid, &stp->st_stid); >> status = nfs_ok; >> put_stateid: >> - up_write(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> nfs4_put_stid(&stp->st_stid); >> out: >> nfsd4_bump_seqid(cstate, status); >> @@ -5196,7 +5203,7 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, >> if (status) >> goto out; >> nfs4_inc_and_copy_stateid(&close->cl_stateid, &stp->st_stid); >> - up_write(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> >> nfsd4_close_open_stateid(stp); >> >> @@ -5422,7 +5429,7 @@ init_lock_stateid(struct nfs4_ol_stateid *stp, struct nfs4_lockowner *lo, >> stp->st_access_bmap = 0; >> stp->st_deny_bmap = open_stp->st_deny_bmap; >> stp->st_openstp = open_stp; >> - init_rwsem(&stp->st_rwsem); >> + mutex_init(&stp->st_mutex); >> list_add(&stp->st_locks, &open_stp->st_locks); >> list_add(&stp->st_perstateowner, &lo->lo_owner.so_stateids); >> spin_lock(&fp->fi_lock); >> @@ -5591,7 +5598,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, >> &open_stp, nn); >> if (status) >> goto out; >> - up_write(&open_stp->st_rwsem); >> + mutex_unlock(&open_stp->st_mutex); >> open_sop = openowner(open_stp->st_stateowner); >> status = nfserr_bad_stateid; >> if (!same_clid(&open_sop->oo_owner.so_client->cl_clientid, >> @@ -5600,7 +5607,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, >> status = lookup_or_create_lock_state(cstate, open_stp, lock, >> &lock_stp, &new); >> if (status == nfs_ok) >> - down_write(&lock_stp->st_rwsem); >> + mutex_lock(&lock_stp->st_mutex); >> } else { >> status = nfs4_preprocess_seqid_op(cstate, >> lock->lk_old_lock_seqid, >> @@ -5704,7 +5711,7 @@ out: >> seqid_mutating_err(ntohl(status))) >> lock_sop->lo_owner.so_seqid++; >> >> - up_write(&lock_stp->st_rwsem); >> + mutex_unlock(&lock_stp->st_mutex); >> >> /* >> * If this is a new, never-before-used stateid, and we are >> @@ -5874,7 +5881,7 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, >> fput: >> fput(filp); >> put_stateid: >> - up_write(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> nfs4_put_stid(&stp->st_stid); >> out: >> nfsd4_bump_seqid(cstate, status); >> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h >> index 986e51e..64053ea 100644 >> --- a/fs/nfsd/state.h >> +++ b/fs/nfsd/state.h >> @@ -535,7 +535,7 @@ struct nfs4_ol_stateid { >> unsigned char st_access_bmap; >> unsigned char st_deny_bmap; >> struct nfs4_ol_stateid *st_openstp; >> - struct rw_semaphore st_rwsem; >> + struct mutex st_mutex; >> }; >> >> static inline struct nfs4_ol_stateid *openlockstateid(struct nfs4_stid *s) >> -- >> 2.7.4 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2] nfsd: Always lock state exclusively. 2016-06-14 15:53 ` Oleg Drokin @ 2016-06-14 18:50 ` J . Bruce Fields 2016-06-14 22:52 ` Jeff Layton 0 siblings, 1 reply; 45+ messages in thread From: J . Bruce Fields @ 2016-06-14 18:50 UTC (permalink / raw) To: Oleg Drokin; +Cc: Jeff Layton, linux-nfs, linux-kernel On Tue, Jun 14, 2016 at 11:53:27AM -0400, Oleg Drokin wrote: > > On Jun 14, 2016, at 11:38 AM, J . Bruce Fields wrote: > > > On Sun, Jun 12, 2016 at 09:26:27PM -0400, Oleg Drokin wrote: > >> It used to be the case that state had an rwlock that was locked for write > >> by downgrades, but for read for upgrades (opens). Well, the problem is > >> if there are two competing opens for the same state, they step on > >> each other toes potentially leading to leaking file descriptors > >> from the state structure, since access mode is a bitmap only set once. > >> > >> Extend the holding region around in nfsd4_process_open2() to avoid > >> racing entry into nfs4_get_vfs_file(). > >> Make init_open_stateid() return with locked stateid to be unlocked > >> by the caller. > >> > >> Now this version held up pretty well in my testing for 24 hours. > >> It still does not address the situation if during one of the racing > >> nfs4_get_vfs_file() calls we are getting an error from one (first?) > >> of them. This is to be addressed in a separate patch after having a > >> solid reproducer (potentially using some fault injection). > >> > >> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> > >> --- > >> fs/nfsd/nfs4state.c | 47 +++++++++++++++++++++++++++-------------------- > >> fs/nfsd/state.h | 2 +- > >> 2 files changed, 28 insertions(+), 21 deletions(-) > >> > >> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > >> index f5f82e1..fa5fb5a 100644 > >> --- a/fs/nfsd/nfs4state.c > >> +++ b/fs/nfsd/nfs4state.c > >> @@ -3487,6 +3487,10 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, > >> struct nfs4_openowner *oo = open->op_openowner; > >> struct nfs4_ol_stateid *retstp = NULL; > >> > >> + /* We are moving these outside of the spinlocks to avoid the warnings */ > >> + mutex_init(&stp->st_mutex); > >> + mutex_lock(&stp->st_mutex); > >> + > >> spin_lock(&oo->oo_owner.so_client->cl_lock); > >> spin_lock(&fp->fi_lock); > >> > >> @@ -3502,13 +3506,14 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, > >> stp->st_access_bmap = 0; > >> stp->st_deny_bmap = 0; > >> stp->st_openstp = NULL; > >> - init_rwsem(&stp->st_rwsem); > >> list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids); > >> list_add(&stp->st_perfile, &fp->fi_stateids); > >> > >> out_unlock: > >> spin_unlock(&fp->fi_lock); > >> spin_unlock(&oo->oo_owner.so_client->cl_lock); > >> + if (retstp) > >> + mutex_lock(&retstp->st_mutex); > >> return retstp; > > > > You're returning with both stp->st_mutex and retstp->st_mutex locked. > > Did you mean to drop that first lock in the (retstp) case, or am I > > missing something? > > Well, I think it's ok (perhaps worthy of a comment) it's that if we matched a different > retstp state, then stp is not used and either released right away or even > if reused, it would be reinitialized in another call to init_open_stateid(), > so it's fine? Oh, I see, you're right. Though I wouldn't have been surprised if that triggered some kind of warning--I guess it's OK here, but typically if I saw a structure freed that had a locked lock in it I'd be a little suspicious that somebody made a mistake. --b. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2] nfsd: Always lock state exclusively. 2016-06-14 18:50 ` J . Bruce Fields @ 2016-06-14 22:52 ` Jeff Layton 2016-06-14 22:54 ` Oleg Drokin 0 siblings, 1 reply; 45+ messages in thread From: Jeff Layton @ 2016-06-14 22:52 UTC (permalink / raw) To: J . Bruce Fields, Oleg Drokin; +Cc: linux-nfs, linux-kernel On Tue, 2016-06-14 at 14:50 -0400, J . Bruce Fields wrote: > On Tue, Jun 14, 2016 at 11:53:27AM -0400, Oleg Drokin wrote: > > > > > > On Jun 14, 2016, at 11:38 AM, J . Bruce Fields wrote: > > > > > > > > On Sun, Jun 12, 2016 at 09:26:27PM -0400, Oleg Drokin wrote: > > > > > > > > It used to be the case that state had an rwlock that was locked for write > > > > by downgrades, but for read for upgrades (opens). Well, the problem is > > > > if there are two competing opens for the same state, they step on > > > > each other toes potentially leading to leaking file descriptors > > > > from the state structure, since access mode is a bitmap only set once. > > > > > > > > Extend the holding region around in nfsd4_process_open2() to avoid > > > > racing entry into nfs4_get_vfs_file(). > > > > Make init_open_stateid() return with locked stateid to be unlocked > > > > by the caller. > > > > > > > > Now this version held up pretty well in my testing for 24 hours. > > > > It still does not address the situation if during one of the racing > > > > nfs4_get_vfs_file() calls we are getting an error from one (first?) > > > > of them. This is to be addressed in a separate patch after having a > > > > solid reproducer (potentially using some fault injection). > > > > > > > > Signed-off-by: Oleg Drokin <green@linuxhacker.ru> > > > > --- > > > > fs/nfsd/nfs4state.c | 47 +++++++++++++++++++++++++++-------------------- > > > > fs/nfsd/state.h | 2 +- > > > > 2 files changed, 28 insertions(+), 21 deletions(-) > > > > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > > > index f5f82e1..fa5fb5a 100644 > > > > --- a/fs/nfsd/nfs4state.c > > > > +++ b/fs/nfsd/nfs4state.c > > > > @@ -3487,6 +3487,10 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, > > > > struct nfs4_openowner *oo = open->op_openowner; > > > > struct nfs4_ol_stateid *retstp = NULL; > > > > > > > > + /* We are moving these outside of the spinlocks to avoid the warnings */ > > > > + mutex_init(&stp->st_mutex); > > > > + mutex_lock(&stp->st_mutex); > > > > + > > > > spin_lock(&oo->oo_owner.so_client->cl_lock); > > > > spin_lock(&fp->fi_lock); > > > > > > > > @@ -3502,13 +3506,14 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, > > > > stp->st_access_bmap = 0; > > > > stp->st_deny_bmap = 0; > > > > stp->st_openstp = NULL; > > > > - init_rwsem(&stp->st_rwsem); > > > > list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids); > > > > list_add(&stp->st_perfile, &fp->fi_stateids); > > > > > > > > out_unlock: > > > > spin_unlock(&fp->fi_lock); > > > > spin_unlock(&oo->oo_owner.so_client->cl_lock); > > > > + if (retstp) > > > > + mutex_lock(&retstp->st_mutex); > > > > return retstp; > > > You're returning with both stp->st_mutex and retstp->st_mutex locked. > > > Did you mean to drop that first lock in the (retstp) case, or am I > > > missing something? > > Well, I think it's ok (perhaps worthy of a comment) it's that if we matched a different > > retstp state, then stp is not used and either released right away or even > > if reused, it would be reinitialized in another call to init_open_stateid(), > > so it's fine? > Oh, I see, you're right. > > Though I wouldn't have been surprised if that triggered some kind of > warning--I guess it's OK here, but typically if I saw a structure freed > that had a locked lock in it I'd be a little suspicious that somebody > made a mistake. > > --b. I think I'd still prefer to have it unlock the mutex in the event that it's not going to use it after all. While that kind of thing is ok for now, it's stuff like that that can turn into a subtle source of bugs later. Also, I think I'd be more comfortable with this being split into (at least) two patches. Do one patch as a straight conversion from rwsem to mutex, and then another that changes the code to take the mutex before hashing the new stateid. -- Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2] nfsd: Always lock state exclusively. 2016-06-14 22:52 ` Jeff Layton @ 2016-06-14 22:54 ` Oleg Drokin 2016-06-14 22:57 ` Jeff Layton 0 siblings, 1 reply; 45+ messages in thread From: Oleg Drokin @ 2016-06-14 22:54 UTC (permalink / raw) To: Jeff Layton; +Cc: J . Bruce Fields, linux-nfs, linux-kernel On Jun 14, 2016, at 6:52 PM, Jeff Layton wrote: > I think I'd still prefer to have it unlock the mutex in the event that > it's not going to use it after all. While that kind of thing is ok for > now, it's stuff like that that can turn into a subtle source of bugs > later. > > Also, I think I'd be more comfortable with this being split into (at > least) two patches. Do one patch as a straight conversion from rwsem to > mutex, and then another that changes the code to take the mutex before > hashing the new stateid. Ok, I guess that could be arranged too. And then there's this Bruce's patch to pull more stuff into the init_open_stateid ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2] nfsd: Always lock state exclusively. 2016-06-14 22:54 ` Oleg Drokin @ 2016-06-14 22:57 ` Jeff Layton 2016-06-15 3:28 ` [PATCH 0/3] nfsd state handling fixes Oleg Drokin 0 siblings, 1 reply; 45+ messages in thread From: Jeff Layton @ 2016-06-14 22:57 UTC (permalink / raw) To: Oleg Drokin; +Cc: J . Bruce Fields, linux-nfs, linux-kernel On Tue, 2016-06-14 at 18:54 -0400, Oleg Drokin wrote: > On Jun 14, 2016, at 6:52 PM, Jeff Layton wrote: > > > > I think I'd still prefer to have it unlock the mutex in the event that > > it's not going to use it after all. While that kind of thing is ok for > > now, it's stuff like that that can turn into a subtle source of bugs > > later. > > > > Also, I think I'd be more comfortable with this being split into (at > > least) two patches. Do one patch as a straight conversion from rwsem to > > mutex, and then another that changes the code to take the mutex before > > hashing the new stateid. > Ok, I guess that could be arranged too. > > And then there's this Bruce's patch to pull more stuff into the init_open_stateid Yeah, that seems like a good idea. We _really_ need an effort to simplify this code. OPEN handling is always messy, but the current code is really much messier than it should be. -- Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH 0/3] nfsd state handling fixes 2016-06-14 22:57 ` Jeff Layton @ 2016-06-15 3:28 ` Oleg Drokin 2016-06-15 3:28 ` [PATCH 1/3] nfsd: Always lock state exclusively Oleg Drokin ` (3 more replies) 0 siblings, 4 replies; 45+ messages in thread From: Oleg Drokin @ 2016-06-15 3:28 UTC (permalink / raw) To: Jeff Layton, J . Bruce Fields; +Cc: linux-nfs, linux-kernel, Oleg Drokin These three patches do the much discussed job of making nfsd state handling more robust in face of races where several opens arrive for the same file at the same time from the same client. This does not yet handle a case when one of those opens gets an error and others don't. Also this is undergoing testing ATM, so please only use it as a discussion/review piece for now. Oleg Drokin (3): nfsd: Always lock state exclusively. nfsd: Extend the mutex holding region around in nfsd4_process_open2() nfsd: Make init_open_stateid() a bit more whole fs/nfsd/nfs4state.c | 67 +++++++++++++++++++++++++++++------------------------ fs/nfsd/state.h | 2 +- 2 files changed, 38 insertions(+), 31 deletions(-) -- 2.7.4 ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH 1/3] nfsd: Always lock state exclusively. 2016-06-15 3:28 ` [PATCH 0/3] nfsd state handling fixes Oleg Drokin @ 2016-06-15 3:28 ` Oleg Drokin 2016-06-15 3:28 ` [PATCH 2/3] nfsd: Extend the mutex holding region around in nfsd4_process_open2() Oleg Drokin ` (2 subsequent siblings) 3 siblings, 0 replies; 45+ messages in thread From: Oleg Drokin @ 2016-06-15 3:28 UTC (permalink / raw) To: Jeff Layton, J . Bruce Fields; +Cc: linux-nfs, linux-kernel, Oleg Drokin It used to be the case that state had an rwlock that was locked for write by downgrades, but for read for upgrades (opens). Well, the problem is if there are two competing opens for the same state, they step on each other toes potentially leading to leaking file descriptors from the state structure, since access mode is a bitmap only set once. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> --- fs/nfsd/nfs4state.c | 40 ++++++++++++++++++++-------------------- fs/nfsd/state.h | 2 +- 2 files changed, 21 insertions(+), 21 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index f5f82e1..c927d36 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3502,7 +3502,7 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, stp->st_access_bmap = 0; stp->st_deny_bmap = 0; stp->st_openstp = NULL; - init_rwsem(&stp->st_rwsem); + mutex_init(&stp->st_mutex); list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids); list_add(&stp->st_perfile, &fp->fi_stateids); @@ -4335,10 +4335,10 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf */ if (stp) { /* Stateid was found, this is an OPEN upgrade */ - down_read(&stp->st_rwsem); + mutex_lock(&stp->st_mutex); status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open); if (status) { - up_read(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); goto out; } } else { @@ -4348,19 +4348,19 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf if (swapstp) { nfs4_put_stid(&stp->st_stid); stp = swapstp; - down_read(&stp->st_rwsem); + mutex_lock(&stp->st_mutex); status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open); if (status) { - up_read(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); goto out; } goto upgrade_out; } - down_read(&stp->st_rwsem); + mutex_lock(&stp->st_mutex); status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open); if (status) { - up_read(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); release_open_stateid(stp); goto out; } @@ -4372,7 +4372,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf } upgrade_out: nfs4_inc_and_copy_stateid(&open->op_stateid, &stp->st_stid); - up_read(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); if (nfsd4_has_session(&resp->cstate)) { if (open->op_deleg_want & NFS4_SHARE_WANT_NO_DELEG) { @@ -4977,12 +4977,12 @@ static __be32 nfs4_seqid_op_checks(struct nfsd4_compound_state *cstate, stateid_ * revoked delegations are kept only for free_stateid. */ return nfserr_bad_stateid; - down_write(&stp->st_rwsem); + mutex_lock(&stp->st_mutex); status = check_stateid_generation(stateid, &stp->st_stid.sc_stateid, nfsd4_has_session(cstate)); if (status == nfs_ok) status = nfs4_check_fh(current_fh, &stp->st_stid); if (status != nfs_ok) - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); return status; } @@ -5030,7 +5030,7 @@ static __be32 nfs4_preprocess_confirmed_seqid_op(struct nfsd4_compound_state *cs return status; oo = openowner(stp->st_stateowner); if (!(oo->oo_flags & NFS4_OO_CONFIRMED)) { - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); nfs4_put_stid(&stp->st_stid); return nfserr_bad_stateid; } @@ -5062,12 +5062,12 @@ nfsd4_open_confirm(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, oo = openowner(stp->st_stateowner); status = nfserr_bad_stateid; if (oo->oo_flags & NFS4_OO_CONFIRMED) { - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); goto put_stateid; } oo->oo_flags |= NFS4_OO_CONFIRMED; nfs4_inc_and_copy_stateid(&oc->oc_resp_stateid, &stp->st_stid); - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); dprintk("NFSD: %s: success, seqid=%d stateid=" STATEID_FMT "\n", __func__, oc->oc_seqid, STATEID_VAL(&stp->st_stid.sc_stateid)); @@ -5143,7 +5143,7 @@ nfsd4_open_downgrade(struct svc_rqst *rqstp, nfs4_inc_and_copy_stateid(&od->od_stateid, &stp->st_stid); status = nfs_ok; put_stateid: - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); nfs4_put_stid(&stp->st_stid); out: nfsd4_bump_seqid(cstate, status); @@ -5196,7 +5196,7 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) goto out; nfs4_inc_and_copy_stateid(&close->cl_stateid, &stp->st_stid); - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); nfsd4_close_open_stateid(stp); @@ -5422,7 +5422,7 @@ init_lock_stateid(struct nfs4_ol_stateid *stp, struct nfs4_lockowner *lo, stp->st_access_bmap = 0; stp->st_deny_bmap = open_stp->st_deny_bmap; stp->st_openstp = open_stp; - init_rwsem(&stp->st_rwsem); + mutex_init(&stp->st_mutex); list_add(&stp->st_locks, &open_stp->st_locks); list_add(&stp->st_perstateowner, &lo->lo_owner.so_stateids); spin_lock(&fp->fi_lock); @@ -5591,7 +5591,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, &open_stp, nn); if (status) goto out; - up_write(&open_stp->st_rwsem); + mutex_unlock(&open_stp->st_mutex); open_sop = openowner(open_stp->st_stateowner); status = nfserr_bad_stateid; if (!same_clid(&open_sop->oo_owner.so_client->cl_clientid, @@ -5600,7 +5600,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = lookup_or_create_lock_state(cstate, open_stp, lock, &lock_stp, &new); if (status == nfs_ok) - down_write(&lock_stp->st_rwsem); + mutex_lock(&lock_stp->st_mutex); } else { status = nfs4_preprocess_seqid_op(cstate, lock->lk_old_lock_seqid, @@ -5704,7 +5704,7 @@ out: seqid_mutating_err(ntohl(status))) lock_sop->lo_owner.so_seqid++; - up_write(&lock_stp->st_rwsem); + mutex_unlock(&lock_stp->st_mutex); /* * If this is a new, never-before-used stateid, and we are @@ -5874,7 +5874,7 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, fput: fput(filp); put_stateid: - up_write(&stp->st_rwsem); + mutex_unlock(&stp->st_mutex); nfs4_put_stid(&stp->st_stid); out: nfsd4_bump_seqid(cstate, status); diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 986e51e..64053ea 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -535,7 +535,7 @@ struct nfs4_ol_stateid { unsigned char st_access_bmap; unsigned char st_deny_bmap; struct nfs4_ol_stateid *st_openstp; - struct rw_semaphore st_rwsem; + struct mutex st_mutex; }; static inline struct nfs4_ol_stateid *openlockstateid(struct nfs4_stid *s) -- 2.7.4 ^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 2/3] nfsd: Extend the mutex holding region around in nfsd4_process_open2() 2016-06-15 3:28 ` [PATCH 0/3] nfsd state handling fixes Oleg Drokin 2016-06-15 3:28 ` [PATCH 1/3] nfsd: Always lock state exclusively Oleg Drokin @ 2016-06-15 3:28 ` Oleg Drokin 2016-06-15 3:28 ` [PATCH 3/3] nfsd: Make init_open_stateid() a bit more whole Oleg Drokin 2016-06-16 1:54 ` [PATCH 0/3] nfsd state handling fixes Oleg Drokin 3 siblings, 0 replies; 45+ messages in thread From: Oleg Drokin @ 2016-06-15 3:28 UTC (permalink / raw) To: Jeff Layton, J . Bruce Fields; +Cc: linux-nfs, linux-kernel, Oleg Drokin To avoid racing entry into nfs4_get_vfs_file(). Make init_open_stateid() return with locked stateid to be unlocked by the caller. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> --- fs/nfsd/nfs4state.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index c927d36..94854a0 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3487,6 +3487,10 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, struct nfs4_openowner *oo = open->op_openowner; struct nfs4_ol_stateid *retstp = NULL; + /* We are moving these outside of the spinlocks to avoid the warnings */ + mutex_init(&stp->st_mutex); + mutex_lock(&stp->st_mutex); + spin_lock(&oo->oo_owner.so_client->cl_lock); spin_lock(&fp->fi_lock); @@ -3502,13 +3506,17 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, stp->st_access_bmap = 0; stp->st_deny_bmap = 0; stp->st_openstp = NULL; - mutex_init(&stp->st_mutex); list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids); list_add(&stp->st_perfile, &fp->fi_stateids); out_unlock: spin_unlock(&fp->fi_lock); spin_unlock(&oo->oo_owner.so_client->cl_lock); + if (retstp) { + mutex_lock(&retstp->st_mutex); + /* Not that we need to, just for neatness */ + mutex_unlock(&stp->st_mutex); + } return retstp; } @@ -4344,11 +4352,14 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf } else { stp = open->op_stp; open->op_stp = NULL; + /* + * init_open_stateid() either returns a locked stateid + * it found, or initializes and locks the new one we passed in + */ swapstp = init_open_stateid(stp, fp, open); if (swapstp) { nfs4_put_stid(&stp->st_stid); stp = swapstp; - mutex_lock(&stp->st_mutex); status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open); if (status) { @@ -4357,7 +4368,6 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf } goto upgrade_out; } - mutex_lock(&stp->st_mutex); status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open); if (status) { mutex_unlock(&stp->st_mutex); -- 2.7.4 ^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 3/3] nfsd: Make init_open_stateid() a bit more whole 2016-06-15 3:28 ` [PATCH 0/3] nfsd state handling fixes Oleg Drokin 2016-06-15 3:28 ` [PATCH 1/3] nfsd: Always lock state exclusively Oleg Drokin 2016-06-15 3:28 ` [PATCH 2/3] nfsd: Extend the mutex holding region around in nfsd4_process_open2() Oleg Drokin @ 2016-06-15 3:28 ` Oleg Drokin 2016-06-16 1:54 ` [PATCH 0/3] nfsd state handling fixes Oleg Drokin 3 siblings, 0 replies; 45+ messages in thread From: Oleg Drokin @ 2016-06-15 3:28 UTC (permalink / raw) To: Jeff Layton, J . Bruce Fields; +Cc: linux-nfs, linux-kernel, Oleg Drokin Move the state selection logic inside from the caller, always making it return correct stp to use. Signed-off-by: J . Bruce Fields <bfields@fieldses.org> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> --- fs/nfsd/nfs4state.c | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 94854a0..70d0b9b 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3480,13 +3480,14 @@ alloc_init_open_stateowner(unsigned int strhashval, struct nfsd4_open *open, } static struct nfs4_ol_stateid * -init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, - struct nfsd4_open *open) +init_open_stateid(struct nfs4_file *fp, struct nfsd4_open *open) { struct nfs4_openowner *oo = open->op_openowner; struct nfs4_ol_stateid *retstp = NULL; + struct nfs4_ol_stateid *stp; + stp = open->op_stp; /* We are moving these outside of the spinlocks to avoid the warnings */ mutex_init(&stp->st_mutex); mutex_lock(&stp->st_mutex); @@ -3497,6 +3498,8 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, retstp = nfsd4_find_existing_open(fp, open); if (retstp) goto out_unlock; + + open->op_stp = NULL; atomic_inc(&stp->st_stid.sc_count); stp->st_stid.sc_type = NFS4_OPEN_STID; INIT_LIST_HEAD(&stp->st_locks); @@ -3514,10 +3517,11 @@ out_unlock: spin_unlock(&oo->oo_owner.so_client->cl_lock); if (retstp) { mutex_lock(&retstp->st_mutex); - /* Not that we need to, just for neatness */ + /* To keep mutex tracking happy */ mutex_unlock(&stp->st_mutex); + stp = retstp; } - return retstp; + return stp; } /* @@ -4313,7 +4317,6 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf struct nfs4_client *cl = open->op_openowner->oo_owner.so_client; struct nfs4_file *fp = NULL; struct nfs4_ol_stateid *stp = NULL; - struct nfs4_ol_stateid *swapstp = NULL; struct nfs4_delegation *dp = NULL; __be32 status; @@ -4350,16 +4353,10 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf goto out; } } else { - stp = open->op_stp; - open->op_stp = NULL; - /* - * init_open_stateid() either returns a locked stateid - * it found, or initializes and locks the new one we passed in - */ - swapstp = init_open_stateid(stp, fp, open); - if (swapstp) { - nfs4_put_stid(&stp->st_stid); - stp = swapstp; + /* stp is returned locked. */ + stp = init_open_stateid(fp, open); + /* See if we lost the race to some other thread */ + if (stp->st_access_bmap != 0) { status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open); if (status) { -- 2.7.4 ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH 0/3] nfsd state handling fixes 2016-06-15 3:28 ` [PATCH 0/3] nfsd state handling fixes Oleg Drokin ` (2 preceding siblings ...) 2016-06-15 3:28 ` [PATCH 3/3] nfsd: Make init_open_stateid() a bit more whole Oleg Drokin @ 2016-06-16 1:54 ` Oleg Drokin 2016-06-16 2:07 ` J . Bruce Fields 3 siblings, 1 reply; 45+ messages in thread From: Oleg Drokin @ 2016-06-16 1:54 UTC (permalink / raw) To: Jeff Layton, J . Bruce Fields; +Cc: linux-nfs, linux-kernel On Jun 14, 2016, at 11:28 PM, Oleg Drokin wrote: > These three patches do the much discussed job of making nfsd state handling > more robust in face of races where several opens arrive for the same file > at the same time from the same client. > > This does not yet handle a case when one of those opens gets an error > and others don't. > > Also this is undergoing testing ATM, so please only use it as a > discussion/review piece for now. With 24 hours in testing an no problems encountered, I guess it's safe to declare this patchset as good to go if nobody has any objections against it. > > Oleg Drokin (3): > nfsd: Always lock state exclusively. > nfsd: Extend the mutex holding region around in nfsd4_process_open2() > nfsd: Make init_open_stateid() a bit more whole > > fs/nfsd/nfs4state.c | 67 +++++++++++++++++++++++++++++------------------------ > fs/nfsd/state.h | 2 +- > 2 files changed, 38 insertions(+), 31 deletions(-) > > -- > 2.7.4 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 0/3] nfsd state handling fixes 2016-06-16 1:54 ` [PATCH 0/3] nfsd state handling fixes Oleg Drokin @ 2016-06-16 2:07 ` J . Bruce Fields 0 siblings, 0 replies; 45+ messages in thread From: J . Bruce Fields @ 2016-06-16 2:07 UTC (permalink / raw) To: Oleg Drokin; +Cc: Jeff Layton, linux-nfs, linux-kernel On Wed, Jun 15, 2016 at 09:54:28PM -0400, Oleg Drokin wrote: > > On Jun 14, 2016, at 11:28 PM, Oleg Drokin wrote: > > > These three patches do the much discussed job of making nfsd state handling > > more robust in face of races where several opens arrive for the same file > > at the same time from the same client. > > > > This does not yet handle a case when one of those opens gets an error > > and others don't. > > > > Also this is undergoing testing ATM, so please only use it as a > > discussion/review piece for now. > > With 24 hours in testing an no problems encountered, I guess > it's safe to declare this patchset as good to go if nobody has any > objections against it. Great, thanks for the testing, and all the work tracking this down; committing for 4.7 (and the first two for stable). --b. > > > > > Oleg Drokin (3): > > nfsd: Always lock state exclusively. > > nfsd: Extend the mutex holding region around in nfsd4_process_open2() > > nfsd: Make init_open_stateid() a bit more whole > > > > fs/nfsd/nfs4state.c | 67 +++++++++++++++++++++++++++++------------------------ > > fs/nfsd/state.h | 2 +- > > 2 files changed, 38 insertions(+), 31 deletions(-) > > > > -- > > 2.7.4 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2] nfsd: Always lock state exclusively. 2016-06-13 1:26 ` [PATCH v2] nfsd: Always lock state exclusively Oleg Drokin 2016-06-14 15:38 ` J . Bruce Fields @ 2016-06-14 15:46 ` J . Bruce Fields 2016-06-14 15:56 ` Oleg Drokin 1 sibling, 1 reply; 45+ messages in thread From: J . Bruce Fields @ 2016-06-14 15:46 UTC (permalink / raw) To: Oleg Drokin; +Cc: Jeff Layton, linux-nfs, linux-kernel On Sun, Jun 12, 2016 at 09:26:27PM -0400, Oleg Drokin wrote: > It used to be the case that state had an rwlock that was locked for write > by downgrades, but for read for upgrades (opens). Well, the problem is > if there are two competing opens for the same state, they step on > each other toes potentially leading to leaking file descriptors > from the state structure, since access mode is a bitmap only set once. > > Extend the holding region around in nfsd4_process_open2() to avoid > racing entry into nfs4_get_vfs_file(). > Make init_open_stateid() return with locked stateid to be unlocked > by the caller. > > Now this version held up pretty well in my testing for 24 hours. > It still does not address the situation if during one of the racing > nfs4_get_vfs_file() calls we are getting an error from one (first?) > of them. This is to be addressed in a separate patch after having a > solid reproducer (potentially using some fault injection). > > Signed-off-by: Oleg Drokin <green@linuxhacker.ru> > --- > fs/nfsd/nfs4state.c | 47 +++++++++++++++++++++++++++-------------------- > fs/nfsd/state.h | 2 +- > 2 files changed, 28 insertions(+), 21 deletions(-) > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index f5f82e1..fa5fb5a 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -3487,6 +3487,10 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, > struct nfs4_openowner *oo = open->op_openowner; > struct nfs4_ol_stateid *retstp = NULL; > > + /* We are moving these outside of the spinlocks to avoid the warnings */ > + mutex_init(&stp->st_mutex); > + mutex_lock(&stp->st_mutex); A mutex_init_locked() primitive might also be convenient here. You could also take the two previous lines from the caller into this function instead of passing in stp, that might simplify the code. (Haven't checked.) --b. > + > spin_lock(&oo->oo_owner.so_client->cl_lock); > spin_lock(&fp->fi_lock); > > @@ -3502,13 +3506,14 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, > stp->st_access_bmap = 0; > stp->st_deny_bmap = 0; > stp->st_openstp = NULL; > - init_rwsem(&stp->st_rwsem); > list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids); > list_add(&stp->st_perfile, &fp->fi_stateids); > > out_unlock: > spin_unlock(&fp->fi_lock); > spin_unlock(&oo->oo_owner.so_client->cl_lock); > + if (retstp) > + mutex_lock(&retstp->st_mutex); > return retstp; > } > > @@ -4335,32 +4340,34 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf > */ > if (stp) { > /* Stateid was found, this is an OPEN upgrade */ > - down_read(&stp->st_rwsem); > + mutex_lock(&stp->st_mutex); > status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open); > if (status) { > - up_read(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > goto out; > } > } else { > stp = open->op_stp; > open->op_stp = NULL; > + /* > + * init_open_stateid() either returns a locked stateid > + * it found, or initializes and locks the new one we passed in > + */ > swapstp = init_open_stateid(stp, fp, open); > if (swapstp) { > nfs4_put_stid(&stp->st_stid); > stp = swapstp; > - down_read(&stp->st_rwsem); > status = nfs4_upgrade_open(rqstp, fp, current_fh, > stp, open); > if (status) { > - up_read(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > goto out; > } > goto upgrade_out; > } > - down_read(&stp->st_rwsem); > status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open); > if (status) { > - up_read(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > release_open_stateid(stp); > goto out; > } > @@ -4372,7 +4379,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf > } > upgrade_out: > nfs4_inc_and_copy_stateid(&open->op_stateid, &stp->st_stid); > - up_read(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > > if (nfsd4_has_session(&resp->cstate)) { > if (open->op_deleg_want & NFS4_SHARE_WANT_NO_DELEG) { > @@ -4977,12 +4984,12 @@ static __be32 nfs4_seqid_op_checks(struct nfsd4_compound_state *cstate, stateid_ > * revoked delegations are kept only for free_stateid. > */ > return nfserr_bad_stateid; > - down_write(&stp->st_rwsem); > + mutex_lock(&stp->st_mutex); > status = check_stateid_generation(stateid, &stp->st_stid.sc_stateid, nfsd4_has_session(cstate)); > if (status == nfs_ok) > status = nfs4_check_fh(current_fh, &stp->st_stid); > if (status != nfs_ok) > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > return status; > } > > @@ -5030,7 +5037,7 @@ static __be32 nfs4_preprocess_confirmed_seqid_op(struct nfsd4_compound_state *cs > return status; > oo = openowner(stp->st_stateowner); > if (!(oo->oo_flags & NFS4_OO_CONFIRMED)) { > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > nfs4_put_stid(&stp->st_stid); > return nfserr_bad_stateid; > } > @@ -5062,12 +5069,12 @@ nfsd4_open_confirm(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > oo = openowner(stp->st_stateowner); > status = nfserr_bad_stateid; > if (oo->oo_flags & NFS4_OO_CONFIRMED) { > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > goto put_stateid; > } > oo->oo_flags |= NFS4_OO_CONFIRMED; > nfs4_inc_and_copy_stateid(&oc->oc_resp_stateid, &stp->st_stid); > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > dprintk("NFSD: %s: success, seqid=%d stateid=" STATEID_FMT "\n", > __func__, oc->oc_seqid, STATEID_VAL(&stp->st_stid.sc_stateid)); > > @@ -5143,7 +5150,7 @@ nfsd4_open_downgrade(struct svc_rqst *rqstp, > nfs4_inc_and_copy_stateid(&od->od_stateid, &stp->st_stid); > status = nfs_ok; > put_stateid: > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > nfs4_put_stid(&stp->st_stid); > out: > nfsd4_bump_seqid(cstate, status); > @@ -5196,7 +5203,7 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > if (status) > goto out; > nfs4_inc_and_copy_stateid(&close->cl_stateid, &stp->st_stid); > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > > nfsd4_close_open_stateid(stp); > > @@ -5422,7 +5429,7 @@ init_lock_stateid(struct nfs4_ol_stateid *stp, struct nfs4_lockowner *lo, > stp->st_access_bmap = 0; > stp->st_deny_bmap = open_stp->st_deny_bmap; > stp->st_openstp = open_stp; > - init_rwsem(&stp->st_rwsem); > + mutex_init(&stp->st_mutex); > list_add(&stp->st_locks, &open_stp->st_locks); > list_add(&stp->st_perstateowner, &lo->lo_owner.so_stateids); > spin_lock(&fp->fi_lock); > @@ -5591,7 +5598,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > &open_stp, nn); > if (status) > goto out; > - up_write(&open_stp->st_rwsem); > + mutex_unlock(&open_stp->st_mutex); > open_sop = openowner(open_stp->st_stateowner); > status = nfserr_bad_stateid; > if (!same_clid(&open_sop->oo_owner.so_client->cl_clientid, > @@ -5600,7 +5607,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > status = lookup_or_create_lock_state(cstate, open_stp, lock, > &lock_stp, &new); > if (status == nfs_ok) > - down_write(&lock_stp->st_rwsem); > + mutex_lock(&lock_stp->st_mutex); > } else { > status = nfs4_preprocess_seqid_op(cstate, > lock->lk_old_lock_seqid, > @@ -5704,7 +5711,7 @@ out: > seqid_mutating_err(ntohl(status))) > lock_sop->lo_owner.so_seqid++; > > - up_write(&lock_stp->st_rwsem); > + mutex_unlock(&lock_stp->st_mutex); > > /* > * If this is a new, never-before-used stateid, and we are > @@ -5874,7 +5881,7 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > fput: > fput(filp); > put_stateid: > - up_write(&stp->st_rwsem); > + mutex_unlock(&stp->st_mutex); > nfs4_put_stid(&stp->st_stid); > out: > nfsd4_bump_seqid(cstate, status); > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h > index 986e51e..64053ea 100644 > --- a/fs/nfsd/state.h > +++ b/fs/nfsd/state.h > @@ -535,7 +535,7 @@ struct nfs4_ol_stateid { > unsigned char st_access_bmap; > unsigned char st_deny_bmap; > struct nfs4_ol_stateid *st_openstp; > - struct rw_semaphore st_rwsem; > + struct mutex st_mutex; > }; > > static inline struct nfs4_ol_stateid *openlockstateid(struct nfs4_stid *s) > -- > 2.7.4 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2] nfsd: Always lock state exclusively. 2016-06-14 15:46 ` [PATCH v2] nfsd: Always lock state exclusively J . Bruce Fields @ 2016-06-14 15:56 ` Oleg Drokin 2016-06-14 18:46 ` J . Bruce Fields 0 siblings, 1 reply; 45+ messages in thread From: Oleg Drokin @ 2016-06-14 15:56 UTC (permalink / raw) To: J . Bruce Fields; +Cc: Jeff Layton, linux-nfs, linux-kernel On Jun 14, 2016, at 11:46 AM, J . Bruce Fields wrote: > On Sun, Jun 12, 2016 at 09:26:27PM -0400, Oleg Drokin wrote: >> It used to be the case that state had an rwlock that was locked for write >> by downgrades, but for read for upgrades (opens). Well, the problem is >> if there are two competing opens for the same state, they step on >> each other toes potentially leading to leaking file descriptors >> from the state structure, since access mode is a bitmap only set once. >> >> Extend the holding region around in nfsd4_process_open2() to avoid >> racing entry into nfs4_get_vfs_file(). >> Make init_open_stateid() return with locked stateid to be unlocked >> by the caller. >> >> Now this version held up pretty well in my testing for 24 hours. >> It still does not address the situation if during one of the racing >> nfs4_get_vfs_file() calls we are getting an error from one (first?) >> of them. This is to be addressed in a separate patch after having a >> solid reproducer (potentially using some fault injection). >> >> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> >> --- >> fs/nfsd/nfs4state.c | 47 +++++++++++++++++++++++++++-------------------- >> fs/nfsd/state.h | 2 +- >> 2 files changed, 28 insertions(+), 21 deletions(-) >> >> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >> index f5f82e1..fa5fb5a 100644 >> --- a/fs/nfsd/nfs4state.c >> +++ b/fs/nfsd/nfs4state.c >> @@ -3487,6 +3487,10 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, >> struct nfs4_openowner *oo = open->op_openowner; >> struct nfs4_ol_stateid *retstp = NULL; >> >> + /* We are moving these outside of the spinlocks to avoid the warnings */ >> + mutex_init(&stp->st_mutex); >> + mutex_lock(&stp->st_mutex); > > A mutex_init_locked() primitive might also be convenient here. I know! I would be able to do it under spinlock then without moving this around too. But alas, not only there is not one, mutex documentation states this is disallowed. > You could also take the two previous lines from the caller into this > function instead of passing in stp, that might simplify the code. > (Haven't checked.) I am not really sure what do you mean here. These lines are moved from further away in this function )well, just the init, anyway). Having half initialisation of stp here and half in the caller sounds kind of strange to me. > > --b. > >> + >> spin_lock(&oo->oo_owner.so_client->cl_lock); >> spin_lock(&fp->fi_lock); >> >> @@ -3502,13 +3506,14 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, >> stp->st_access_bmap = 0; >> stp->st_deny_bmap = 0; >> stp->st_openstp = NULL; >> - init_rwsem(&stp->st_rwsem); >> list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids); >> list_add(&stp->st_perfile, &fp->fi_stateids); >> >> out_unlock: >> spin_unlock(&fp->fi_lock); >> spin_unlock(&oo->oo_owner.so_client->cl_lock); >> + if (retstp) >> + mutex_lock(&retstp->st_mutex); >> return retstp; >> } >> >> @@ -4335,32 +4340,34 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf >> */ >> if (stp) { >> /* Stateid was found, this is an OPEN upgrade */ >> - down_read(&stp->st_rwsem); >> + mutex_lock(&stp->st_mutex); >> status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open); >> if (status) { >> - up_read(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> goto out; >> } >> } else { >> stp = open->op_stp; >> open->op_stp = NULL; >> + /* >> + * init_open_stateid() either returns a locked stateid >> + * it found, or initializes and locks the new one we passed in >> + */ >> swapstp = init_open_stateid(stp, fp, open); >> if (swapstp) { >> nfs4_put_stid(&stp->st_stid); >> stp = swapstp; >> - down_read(&stp->st_rwsem); >> status = nfs4_upgrade_open(rqstp, fp, current_fh, >> stp, open); >> if (status) { >> - up_read(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> goto out; >> } >> goto upgrade_out; >> } >> - down_read(&stp->st_rwsem); >> status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open); >> if (status) { >> - up_read(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> release_open_stateid(stp); >> goto out; >> } >> @@ -4372,7 +4379,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf >> } >> upgrade_out: >> nfs4_inc_and_copy_stateid(&open->op_stateid, &stp->st_stid); >> - up_read(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> >> if (nfsd4_has_session(&resp->cstate)) { >> if (open->op_deleg_want & NFS4_SHARE_WANT_NO_DELEG) { >> @@ -4977,12 +4984,12 @@ static __be32 nfs4_seqid_op_checks(struct nfsd4_compound_state *cstate, stateid_ >> * revoked delegations are kept only for free_stateid. >> */ >> return nfserr_bad_stateid; >> - down_write(&stp->st_rwsem); >> + mutex_lock(&stp->st_mutex); >> status = check_stateid_generation(stateid, &stp->st_stid.sc_stateid, nfsd4_has_session(cstate)); >> if (status == nfs_ok) >> status = nfs4_check_fh(current_fh, &stp->st_stid); >> if (status != nfs_ok) >> - up_write(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> return status; >> } >> >> @@ -5030,7 +5037,7 @@ static __be32 nfs4_preprocess_confirmed_seqid_op(struct nfsd4_compound_state *cs >> return status; >> oo = openowner(stp->st_stateowner); >> if (!(oo->oo_flags & NFS4_OO_CONFIRMED)) { >> - up_write(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> nfs4_put_stid(&stp->st_stid); >> return nfserr_bad_stateid; >> } >> @@ -5062,12 +5069,12 @@ nfsd4_open_confirm(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, >> oo = openowner(stp->st_stateowner); >> status = nfserr_bad_stateid; >> if (oo->oo_flags & NFS4_OO_CONFIRMED) { >> - up_write(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> goto put_stateid; >> } >> oo->oo_flags |= NFS4_OO_CONFIRMED; >> nfs4_inc_and_copy_stateid(&oc->oc_resp_stateid, &stp->st_stid); >> - up_write(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> dprintk("NFSD: %s: success, seqid=%d stateid=" STATEID_FMT "\n", >> __func__, oc->oc_seqid, STATEID_VAL(&stp->st_stid.sc_stateid)); >> >> @@ -5143,7 +5150,7 @@ nfsd4_open_downgrade(struct svc_rqst *rqstp, >> nfs4_inc_and_copy_stateid(&od->od_stateid, &stp->st_stid); >> status = nfs_ok; >> put_stateid: >> - up_write(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> nfs4_put_stid(&stp->st_stid); >> out: >> nfsd4_bump_seqid(cstate, status); >> @@ -5196,7 +5203,7 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, >> if (status) >> goto out; >> nfs4_inc_and_copy_stateid(&close->cl_stateid, &stp->st_stid); >> - up_write(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> >> nfsd4_close_open_stateid(stp); >> >> @@ -5422,7 +5429,7 @@ init_lock_stateid(struct nfs4_ol_stateid *stp, struct nfs4_lockowner *lo, >> stp->st_access_bmap = 0; >> stp->st_deny_bmap = open_stp->st_deny_bmap; >> stp->st_openstp = open_stp; >> - init_rwsem(&stp->st_rwsem); >> + mutex_init(&stp->st_mutex); >> list_add(&stp->st_locks, &open_stp->st_locks); >> list_add(&stp->st_perstateowner, &lo->lo_owner.so_stateids); >> spin_lock(&fp->fi_lock); >> @@ -5591,7 +5598,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, >> &open_stp, nn); >> if (status) >> goto out; >> - up_write(&open_stp->st_rwsem); >> + mutex_unlock(&open_stp->st_mutex); >> open_sop = openowner(open_stp->st_stateowner); >> status = nfserr_bad_stateid; >> if (!same_clid(&open_sop->oo_owner.so_client->cl_clientid, >> @@ -5600,7 +5607,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, >> status = lookup_or_create_lock_state(cstate, open_stp, lock, >> &lock_stp, &new); >> if (status == nfs_ok) >> - down_write(&lock_stp->st_rwsem); >> + mutex_lock(&lock_stp->st_mutex); >> } else { >> status = nfs4_preprocess_seqid_op(cstate, >> lock->lk_old_lock_seqid, >> @@ -5704,7 +5711,7 @@ out: >> seqid_mutating_err(ntohl(status))) >> lock_sop->lo_owner.so_seqid++; >> >> - up_write(&lock_stp->st_rwsem); >> + mutex_unlock(&lock_stp->st_mutex); >> >> /* >> * If this is a new, never-before-used stateid, and we are >> @@ -5874,7 +5881,7 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, >> fput: >> fput(filp); >> put_stateid: >> - up_write(&stp->st_rwsem); >> + mutex_unlock(&stp->st_mutex); >> nfs4_put_stid(&stp->st_stid); >> out: >> nfsd4_bump_seqid(cstate, status); >> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h >> index 986e51e..64053ea 100644 >> --- a/fs/nfsd/state.h >> +++ b/fs/nfsd/state.h >> @@ -535,7 +535,7 @@ struct nfs4_ol_stateid { >> unsigned char st_access_bmap; >> unsigned char st_deny_bmap; >> struct nfs4_ol_stateid *st_openstp; >> - struct rw_semaphore st_rwsem; >> + struct mutex st_mutex; >> }; >> >> static inline struct nfs4_ol_stateid *openlockstateid(struct nfs4_stid *s) >> -- >> 2.7.4 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2] nfsd: Always lock state exclusively. 2016-06-14 15:56 ` Oleg Drokin @ 2016-06-14 18:46 ` J . Bruce Fields 2016-06-15 2:19 ` Oleg Drokin 0 siblings, 1 reply; 45+ messages in thread From: J . Bruce Fields @ 2016-06-14 18:46 UTC (permalink / raw) To: Oleg Drokin; +Cc: Jeff Layton, linux-nfs, linux-kernel On Tue, Jun 14, 2016 at 11:56:20AM -0400, Oleg Drokin wrote: > > On Jun 14, 2016, at 11:46 AM, J . Bruce Fields wrote: > > > On Sun, Jun 12, 2016 at 09:26:27PM -0400, Oleg Drokin wrote: > >> It used to be the case that state had an rwlock that was locked for write > >> by downgrades, but for read for upgrades (opens). Well, the problem is > >> if there are two competing opens for the same state, they step on > >> each other toes potentially leading to leaking file descriptors > >> from the state structure, since access mode is a bitmap only set once. > >> > >> Extend the holding region around in nfsd4_process_open2() to avoid > >> racing entry into nfs4_get_vfs_file(). > >> Make init_open_stateid() return with locked stateid to be unlocked > >> by the caller. > >> > >> Now this version held up pretty well in my testing for 24 hours. > >> It still does not address the situation if during one of the racing > >> nfs4_get_vfs_file() calls we are getting an error from one (first?) > >> of them. This is to be addressed in a separate patch after having a > >> solid reproducer (potentially using some fault injection). > >> > >> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> > >> --- > >> fs/nfsd/nfs4state.c | 47 +++++++++++++++++++++++++++-------------------- > >> fs/nfsd/state.h | 2 +- > >> 2 files changed, 28 insertions(+), 21 deletions(-) > >> > >> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > >> index f5f82e1..fa5fb5a 100644 > >> --- a/fs/nfsd/nfs4state.c > >> +++ b/fs/nfsd/nfs4state.c > >> @@ -3487,6 +3487,10 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, > >> struct nfs4_openowner *oo = open->op_openowner; > >> struct nfs4_ol_stateid *retstp = NULL; > >> > >> + /* We are moving these outside of the spinlocks to avoid the warnings */ > >> + mutex_init(&stp->st_mutex); > >> + mutex_lock(&stp->st_mutex); > > > > A mutex_init_locked() primitive might also be convenient here. > > I know! I would be able to do it under spinlock then without moving this around too. > > But alas, not only there is not one, mutex documentation states this is disallowed. You're just talking about this comment?: * It is not allowed to initialize an already locked mutex. That's a weird comment. You're proably right that what they meant was something like "It is not allowed to initialize a mutex to locked state". But, I don't know, taken literally that comment doesn't make sense (how could you even distinguish between an already-locked mutex and an uninitialized mutex?), so maybe it'd be worth asking. > > You could also take the two previous lines from the caller into this > > function instead of passing in stp, that might simplify the code. > > (Haven't checked.) > > I am not really sure what do you mean here. > These lines are moved from further away in this function )well, just the init, anyway). > > Having half initialisation of stp here and half in the caller sounds kind of strange > to me. I was thinking of something like the following--so init_open_stateid hides more of the details of the swapping. Untested. Does it look like an improvement to you? There's got to be a way to make this code a little less convoluted.... --b. diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index fa5fb5aa4847..41b59854c40f 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3480,13 +3480,15 @@ alloc_init_open_stateowner(unsigned int strhashval, struct nfsd4_open *open, } static struct nfs4_ol_stateid * -init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, - struct nfsd4_open *open) +init_open_stateid(struct nfs4_file *fp, struct nfsd4_open *open) { struct nfs4_openowner *oo = open->op_openowner; struct nfs4_ol_stateid *retstp = NULL; + struct nfs4_ol_stateid *stp; + stp = open->op_stp; + open->op_stp = NULL; /* We are moving these outside of the spinlocks to avoid the warnings */ mutex_init(&stp->st_mutex); mutex_lock(&stp->st_mutex); @@ -3512,9 +3514,12 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, out_unlock: spin_unlock(&fp->fi_lock); spin_unlock(&oo->oo_owner.so_client->cl_lock); - if (retstp) - mutex_lock(&retstp->st_mutex); - return retstp; + if (retstp) { + nfs4_put_stid(&stp->st_stid); + stp = retstp; + mutex_lock(&stp->st_mutex); + } + return stp; } /* @@ -4310,7 +4315,6 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf struct nfs4_client *cl = open->op_openowner->oo_owner.so_client; struct nfs4_file *fp = NULL; struct nfs4_ol_stateid *stp = NULL; - struct nfs4_ol_stateid *swapstp = NULL; struct nfs4_delegation *dp = NULL; __be32 status; @@ -4347,16 +4351,9 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf goto out; } } else { - stp = open->op_stp; - open->op_stp = NULL; - /* - * init_open_stateid() either returns a locked stateid - * it found, or initializes and locks the new one we passed in - */ - swapstp = init_open_stateid(stp, fp, open); - if (swapstp) { - nfs4_put_stid(&stp->st_stid); - stp = swapstp; + /* stp is returned locked: */ + stp = init_open_stateid(fp, open); + if (stp->st_access_bmap == 0) { status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open); if (status) { ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH v2] nfsd: Always lock state exclusively. 2016-06-14 18:46 ` J . Bruce Fields @ 2016-06-15 2:19 ` Oleg Drokin 2016-06-15 13:31 ` J . Bruce Fields 0 siblings, 1 reply; 45+ messages in thread From: Oleg Drokin @ 2016-06-15 2:19 UTC (permalink / raw) To: J . Bruce Fields; +Cc: Jeff Layton, linux-nfs, linux-kernel On Jun 14, 2016, at 2:46 PM, J . Bruce Fields wrote: > On Tue, Jun 14, 2016 at 11:56:20AM -0400, Oleg Drokin wrote: >> >> On Jun 14, 2016, at 11:46 AM, J . Bruce Fields wrote: >> >>> On Sun, Jun 12, 2016 at 09:26:27PM -0400, Oleg Drokin wrote: >>>> It used to be the case that state had an rwlock that was locked for write >>>> by downgrades, but for read for upgrades (opens). Well, the problem is >>>> if there are two competing opens for the same state, they step on >>>> each other toes potentially leading to leaking file descriptors >>>> from the state structure, since access mode is a bitmap only set once. >>>> >>>> Extend the holding region around in nfsd4_process_open2() to avoid >>>> racing entry into nfs4_get_vfs_file(). >>>> Make init_open_stateid() return with locked stateid to be unlocked >>>> by the caller. >>>> >>>> Now this version held up pretty well in my testing for 24 hours. >>>> It still does not address the situation if during one of the racing >>>> nfs4_get_vfs_file() calls we are getting an error from one (first?) >>>> of them. This is to be addressed in a separate patch after having a >>>> solid reproducer (potentially using some fault injection). >>>> >>>> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> >>>> --- >>>> fs/nfsd/nfs4state.c | 47 +++++++++++++++++++++++++++-------------------- >>>> fs/nfsd/state.h | 2 +- >>>> 2 files changed, 28 insertions(+), 21 deletions(-) >>>> >>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >>>> index f5f82e1..fa5fb5a 100644 >>>> --- a/fs/nfsd/nfs4state.c >>>> +++ b/fs/nfsd/nfs4state.c >>>> @@ -3487,6 +3487,10 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, >>>> struct nfs4_openowner *oo = open->op_openowner; >>>> struct nfs4_ol_stateid *retstp = NULL; >>>> >>>> + /* We are moving these outside of the spinlocks to avoid the warnings */ >>>> + mutex_init(&stp->st_mutex); >>>> + mutex_lock(&stp->st_mutex); >>> >>> A mutex_init_locked() primitive might also be convenient here. >> >> I know! I would be able to do it under spinlock then without moving this around too. >> >> But alas, not only there is not one, mutex documentation states this is disallowed. > > You're just talking about this comment?: > > * It is not allowed to initialize an already locked mutex. > > That's a weird comment. You're proably right that what they meant was > something like "It is not allowed to initialize a mutex to locked > state". But, I don't know, taken literally that comment doesn't make > sense (how could you even distinguish between an already-locked mutex > and an uninitialized mutex?), so maybe it'd be worth asking. I think this is because of the strict ownership tracking or something. I guess I can ask. >>> You could also take the two previous lines from the caller into this >>> function instead of passing in stp, that might simplify the code. >>> (Haven't checked.) >> >> I am not really sure what do you mean here. >> These lines are moved from further away in this function )well, just the init, anyway). >> >> Having half initialisation of stp here and half in the caller sounds kind of strange >> to me. > > I was thinking of something like the following--so init_open_stateid > hides more of the details of the swapping. Untested. Does it look like > an improvement to you? > > There's got to be a way to make this code a little less convoluted.... > > --b. > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index fa5fb5aa4847..41b59854c40f 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -3480,13 +3480,15 @@ alloc_init_open_stateowner(unsigned int strhashval, struct nfsd4_open *open, > } > > static struct nfs4_ol_stateid * > -init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, > - struct nfsd4_open *open) > +init_open_stateid(struct nfs4_file *fp, struct nfsd4_open *open) > { > > struct nfs4_openowner *oo = open->op_openowner; > struct nfs4_ol_stateid *retstp = NULL; > + struct nfs4_ol_stateid *stp; > > + stp = open->op_stp; > + open->op_stp = NULL; > /* We are moving these outside of the spinlocks to avoid the warnings */ > mutex_init(&stp->st_mutex); > mutex_lock(&stp->st_mutex); > @@ -3512,9 +3514,12 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, > out_unlock: > spin_unlock(&fp->fi_lock); > spin_unlock(&oo->oo_owner.so_client->cl_lock); > - if (retstp) > - mutex_lock(&retstp->st_mutex); > - return retstp; > + if (retstp) { > + nfs4_put_stid(&stp->st_stid); So as I am trying to integrate this into my patchset, do we really need this? We don't if we took the other path and left this one hanging off the struct nfsd4_open (why do we need to assign it NULL before the search?) I imagine then we'd save some free/realloc churn as well? I assume struct nfsd4_open cannot be shared between threads? Otherwise we have bigger problems at hand like mutex init on a locked mutex from another thread and stuff. I'll try this theory I guess. > + stp = retstp; > + mutex_lock(&stp->st_mutex); > + } > + return stp; > } > > /* > @@ -4310,7 +4315,6 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf > struct nfs4_client *cl = open->op_openowner->oo_owner.so_client; > struct nfs4_file *fp = NULL; > struct nfs4_ol_stateid *stp = NULL; > - struct nfs4_ol_stateid *swapstp = NULL; > struct nfs4_delegation *dp = NULL; > __be32 status; > > @@ -4347,16 +4351,9 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf > goto out; > } > } else { > - stp = open->op_stp; > - open->op_stp = NULL; > - /* > - * init_open_stateid() either returns a locked stateid > - * it found, or initializes and locks the new one we passed in > - */ > - swapstp = init_open_stateid(stp, fp, open); > - if (swapstp) { > - nfs4_put_stid(&stp->st_stid); > - stp = swapstp; > + /* stp is returned locked: */ > + stp = init_open_stateid(fp, open); > + if (stp->st_access_bmap == 0) { > status = nfs4_upgrade_open(rqstp, fp, current_fh, > stp, open); > if (status) { ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2] nfsd: Always lock state exclusively. 2016-06-15 2:19 ` Oleg Drokin @ 2016-06-15 13:31 ` J . Bruce Fields 0 siblings, 0 replies; 45+ messages in thread From: J . Bruce Fields @ 2016-06-15 13:31 UTC (permalink / raw) To: Oleg Drokin; +Cc: Jeff Layton, linux-nfs, linux-kernel On Tue, Jun 14, 2016 at 10:19:49PM -0400, Oleg Drokin wrote: > On Jun 14, 2016, at 2:46 PM, J . Bruce Fields wrote: > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > index fa5fb5aa4847..41b59854c40f 100644 > > --- a/fs/nfsd/nfs4state.c > > +++ b/fs/nfsd/nfs4state.c > > @@ -3480,13 +3480,15 @@ alloc_init_open_stateowner(unsigned int strhashval, struct nfsd4_open *open, > > } > > > > static struct nfs4_ol_stateid * > > -init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, > > - struct nfsd4_open *open) > > +init_open_stateid(struct nfs4_file *fp, struct nfsd4_open *open) > > { > > > > struct nfs4_openowner *oo = open->op_openowner; > > struct nfs4_ol_stateid *retstp = NULL; > > + struct nfs4_ol_stateid *stp; > > > > + stp = open->op_stp; > > + open->op_stp = NULL; > > /* We are moving these outside of the spinlocks to avoid the warnings */ > > mutex_init(&stp->st_mutex); > > mutex_lock(&stp->st_mutex); > > @@ -3512,9 +3514,12 @@ init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp, > > out_unlock: > > spin_unlock(&fp->fi_lock); > > spin_unlock(&oo->oo_owner.so_client->cl_lock); > > - if (retstp) > > - mutex_lock(&retstp->st_mutex); > > - return retstp; > > + if (retstp) { > > + nfs4_put_stid(&stp->st_stid); > > So as I am trying to integrate this into my patchset, > do we really need this? > We don't if we took the other path and left this one > hanging off the struct nfsd4_open (why do we need to > assign it NULL before the search?) I imagine then > we'd save some free/realloc churn as well? Yes, good idea. > I assume struct nfsd4_open cannot be shared between threads? Right. > Otherwise we have bigger problems at hand like mutex init on a locked > mutex from another thread and stuff. > > I'll try this theory I guess. Sounds good! --b. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Files leak from nfsd in 4.7.1-rc1 (and more?) 2016-06-08 10:58 ` Jeff Layton 2016-06-08 14:44 ` Oleg Drokin 2016-06-08 16:10 ` Oleg Drokin @ 2016-06-09 12:13 ` Andrew W Elble 2 siblings, 0 replies; 45+ messages in thread From: Andrew W Elble @ 2016-06-09 12:13 UTC (permalink / raw) To: Jeff Layton Cc: Oleg Drokin, J. Bruce Fields, linux-nfs, <linux-kernel@vger.kernel.org> Mailing List > But...suppose we have two racing OPEN calls. They're both in > nfs4_get_vfs_file. One opens the file and succeeds and the other fails > and ends up in out_put_access. At that point, you could end up > clobbering the successful update to st_access_bmap from the other task, > and we'd end up not putting the file access references in > release_all_access. Pondering further (with an assumption about the workload), isn't this path also likely to involve a lease break in progress on that file? Thanks, Andy -- Andrew W. Elble aweits@discipline.rit.edu Infrastructure Engineer, Communications Technical Lead Rochester Institute of Technology PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912 ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2016-06-16 2:07 UTC | newest] Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-06-07 15:37 Files leak from nfsd in 4.7.1-rc1 (and more?) Oleg Drokin 2016-06-07 17:10 ` Jeff Layton 2016-06-07 17:30 ` Oleg Drokin 2016-06-07 20:04 ` Jeff Layton 2016-06-07 23:39 ` Oleg Drokin 2016-06-08 0:03 ` Jeff Layton 2016-06-08 0:46 ` Oleg Drokin 2016-06-08 2:22 ` Oleg Drokin 2016-06-08 3:55 ` Oleg Drokin 2016-06-08 10:58 ` Jeff Layton 2016-06-08 14:44 ` Oleg Drokin 2016-06-08 16:10 ` Oleg Drokin 2016-06-08 17:22 ` Jeff Layton 2016-06-08 17:37 ` Oleg Drokin 2016-06-09 2:55 ` [PATCH] nfsd: Always lock state exclusively Oleg Drokin 2016-06-09 10:13 ` Jeff Layton 2016-06-09 21:01 ` [PATCH] nfsd: Close a race between access checking/setting in nfs4_get_vfs_file Oleg Drokin 2016-06-10 4:18 ` Oleg Drokin 2016-06-10 10:50 ` Jeff Layton 2016-06-10 20:55 ` J . Bruce Fields 2016-06-11 15:41 ` Oleg Drokin 2016-06-12 1:33 ` Jeff Layton 2016-06-12 2:06 ` Oleg Drokin 2016-06-12 2:50 ` Jeff Layton 2016-06-12 3:15 ` Oleg Drokin 2016-06-12 13:13 ` Jeff Layton 2016-06-13 1:26 ` [PATCH v2] nfsd: Always lock state exclusively Oleg Drokin 2016-06-14 15:38 ` J . Bruce Fields 2016-06-14 15:53 ` Oleg Drokin 2016-06-14 18:50 ` J . Bruce Fields 2016-06-14 22:52 ` Jeff Layton 2016-06-14 22:54 ` Oleg Drokin 2016-06-14 22:57 ` Jeff Layton 2016-06-15 3:28 ` [PATCH 0/3] nfsd state handling fixes Oleg Drokin 2016-06-15 3:28 ` [PATCH 1/3] nfsd: Always lock state exclusively Oleg Drokin 2016-06-15 3:28 ` [PATCH 2/3] nfsd: Extend the mutex holding region around in nfsd4_process_open2() Oleg Drokin 2016-06-15 3:28 ` [PATCH 3/3] nfsd: Make init_open_stateid() a bit more whole Oleg Drokin 2016-06-16 1:54 ` [PATCH 0/3] nfsd state handling fixes Oleg Drokin 2016-06-16 2:07 ` J . Bruce Fields 2016-06-14 15:46 ` [PATCH v2] nfsd: Always lock state exclusively J . Bruce Fields 2016-06-14 15:56 ` Oleg Drokin 2016-06-14 18:46 ` J . Bruce Fields 2016-06-15 2:19 ` Oleg Drokin 2016-06-15 13:31 ` J . Bruce Fields 2016-06-09 12:13 ` Files leak from nfsd in 4.7.1-rc1 (and more?) Andrew W Elble
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).