* [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-09 1:44 Kosuke Tatsukawa 2015-10-09 5:56 ` Neil Brown 0 siblings, 1 reply; 28+ messages in thread From: Kosuke Tatsukawa @ 2015-10-09 1:44 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker, J. Bruce Fields, Jeff Layton, David S. Miller Cc: linux-nfs, netdev, linux-kernel There are several places in net/sunrpc/svcsock.c which calls waitqueue_active() without calling a memory barrier. Add a memory barrier just as in wq_has_sleeper(). I found this issue when I was looking through the linux source code for places calling waitqueue_active() before wake_up*(), but without preceding memory barriers, after sending a patch to fix a similar issue in drivers/tty/n_tty.c (Details about the original issue can be found here: https://lkml.org/lkml/2015/9/28/849). Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> --- v2: - Fixed compiler warnings caused by type mismatch v1: - https://lkml.org/lkml/2015/10/8/993 --- net/sunrpc/svcsock.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 0c81202..ec19444 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -414,6 +414,7 @@ static void svc_udp_data_ready(struct sock *sk) set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); svc_xprt_enqueue(&svsk->sk_xprt); } + smp_mb(); if (wq && waitqueue_active(wq)) wake_up_interruptible(wq); } @@ -432,6 +433,7 @@ static void svc_write_space(struct sock *sk) svc_xprt_enqueue(&svsk->sk_xprt); } + smp_mb(); if (wq && waitqueue_active(wq)) { dprintk("RPC svc_write_space: someone sleeping on %p\n", svsk); @@ -787,6 +789,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk) } wq = sk_sleep(sk); + smp_mb(); if (wq && waitqueue_active(wq)) wake_up_interruptible_all(wq); } @@ -808,6 +811,7 @@ static void svc_tcp_state_change(struct sock *sk) set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags); svc_xprt_enqueue(&svsk->sk_xprt); } + smp_mb(); if (wq && waitqueue_active(wq)) wake_up_interruptible_all(wq); } @@ -823,6 +827,7 @@ static void svc_tcp_data_ready(struct sock *sk) set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); svc_xprt_enqueue(&svsk->sk_xprt); } + smp_mb(); if (wq && waitqueue_active(wq)) wake_up_interruptible(wq); } @@ -1594,6 +1599,7 @@ static void svc_sock_detach(struct svc_xprt *xprt) sk->sk_write_space = svsk->sk_owspace; wq = sk_sleep(sk); + smp_mb(); if (wq && waitqueue_active(wq)) wake_up_interruptible(wq); } -- 1.7.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc 2015-10-09 1:44 [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc Kosuke Tatsukawa @ 2015-10-09 5:56 ` Neil Brown 2015-10-09 6:29 ` Kosuke Tatsukawa 0 siblings, 1 reply; 28+ messages in thread From: Neil Brown @ 2015-10-09 5:56 UTC (permalink / raw) To: Kosuke Tatsukawa, Trond Myklebust, Anna Schumaker, J. Bruce Fields, Jeff Layton, David S. Miller Cc: linux-nfs, netdev, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1551 bytes --] Kosuke Tatsukawa <tatsu@ab.jp.nec.com> writes: > There are several places in net/sunrpc/svcsock.c which calls > waitqueue_active() without calling a memory barrier. Add a memory > barrier just as in wq_has_sleeper(). > > I found this issue when I was looking through the linux source code > for places calling waitqueue_active() before wake_up*(), but without > preceding memory barriers, after sending a patch to fix a similar > issue in drivers/tty/n_tty.c (Details about the original issue can be > found here: https://lkml.org/lkml/2015/9/28/849). hi, this feels like the wrong approach to the problem. It requires extra 'smb_mb's to be spread around which are hard to understand as easy to forget. A quick look seems to suggest that (nearly) every waitqueue_active() will need an smb_mb. Could we just put the smb_mb() inside waitqueue_active()?? Thanks, NeilBrown > > Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> > --- > v2: > - Fixed compiler warnings caused by type mismatch > v1: > - https://lkml.org/lkml/2015/10/8/993 > --- > net/sunrpc/svcsock.c | 6 ++++++ > 1 files changed, 6 insertions(+), 0 deletions(-) > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > index 0c81202..ec19444 100644 > --- a/net/sunrpc/svcsock.c > +++ b/net/sunrpc/svcsock.c > @@ -414,6 +414,7 @@ static void svc_udp_data_ready(struct sock *sk) > set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); > svc_xprt_enqueue(&svsk->sk_xprt); > } > + smp_mb(); > if (wq && waitqueue_active(wq)) > wake_up_interruptible(wq); > } [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-09 6:29 ` Kosuke Tatsukawa 0 siblings, 0 replies; 28+ messages in thread From: Kosuke Tatsukawa @ 2015-10-09 6:29 UTC (permalink / raw) To: Neil Brown Cc: Trond Myklebust, Anna Schumaker, J. Bruce Fields, Jeff Layton, David S. Miller, linux-nfs, netdev, linux-kernel Neil Brown wrote: > Kosuke Tatsukawa <tatsu@ab.jp.nec.com> writes: > >> There are several places in net/sunrpc/svcsock.c which calls >> waitqueue_active() without calling a memory barrier. Add a memory >> barrier just as in wq_has_sleeper(). >> >> I found this issue when I was looking through the linux source code >> for places calling waitqueue_active() before wake_up*(), but without >> preceding memory barriers, after sending a patch to fix a similar >> issue in drivers/tty/n_tty.c (Details about the original issue can be >> found here: https://lkml.org/lkml/2015/9/28/849). > > hi, > this feels like the wrong approach to the problem. It requires extra > 'smb_mb's to be spread around which are hard to understand as easy to > forget. > > A quick look seems to suggest that (nearly) every waitqueue_active() > will need an smb_mb. Could we just put the smb_mb() inside > waitqueue_active()?? <snip> There are around 200 occurrences of waitqueue_active() in the kernel source, and most of the places which use it before wake_up are either protected by some spin lock, or already has a memory barrier or some kind of atomic operation before it. Simply adding smp_mb() to waitqueue_active() would incur extra cost in many cases and won't be a good idea. Another way to solve this problem is to remove the waitqueue_active(), making the code look like this; if (wq) wake_up_interruptible(wq); This also fixes the problem because the spinlock in the wake_up*() acts as a memory barrier and prevents the code from being reordered by the CPU (and it also makes the resulting code is much simpler). --- Kosuke TATSUKAWA | 3rd IT Platform Department | IT Platform Division, NEC Corporation | tatsu@ab.jp.nec.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-09 6:29 ` Kosuke Tatsukawa 0 siblings, 0 replies; 28+ messages in thread From: Kosuke Tatsukawa @ 2015-10-09 6:29 UTC (permalink / raw) To: Neil Brown Cc: Trond Myklebust, Anna Schumaker, J. Bruce Fields, Jeff Layton, David S. Miller, linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Neil Brown wrote: > Kosuke Tatsukawa <tatsu-zZGIbrA41Td8UrSeD/g0lQ@public.gmane.org> writes: > >> There are several places in net/sunrpc/svcsock.c which calls >> waitqueue_active() without calling a memory barrier. Add a memory >> barrier just as in wq_has_sleeper(). >> >> I found this issue when I was looking through the linux source code >> for places calling waitqueue_active() before wake_up*(), but without >> preceding memory barriers, after sending a patch to fix a similar >> issue in drivers/tty/n_tty.c (Details about the original issue can be >> found here: https://lkml.org/lkml/2015/9/28/849). > > hi, > this feels like the wrong approach to the problem. It requires extra > 'smb_mb's to be spread around which are hard to understand as easy to > forget. > > A quick look seems to suggest that (nearly) every waitqueue_active() > will need an smb_mb. Could we just put the smb_mb() inside > waitqueue_active()?? <snip> There are around 200 occurrences of waitqueue_active() in the kernel source, and most of the places which use it before wake_up are either protected by some spin lock, or already has a memory barrier or some kind of atomic operation before it. Simply adding smp_mb() to waitqueue_active() would incur extra cost in many cases and won't be a good idea. Another way to solve this problem is to remove the waitqueue_active(), making the code look like this; if (wq) wake_up_interruptible(wq); This also fixes the problem because the spinlock in the wake_up*() acts as a memory barrier and prevents the code from being reordered by the CPU (and it also makes the resulting code is much simpler). --- Kosuke TATSUKAWA | 3rd IT Platform Department | IT Platform Division, NEC Corporation | tatsu-zZGIbrA41Td8UrSeD/g0lQ@public.gmane.org -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc 2015-10-09 6:29 ` Kosuke Tatsukawa (?) @ 2015-10-09 21:18 ` J. Bruce Fields 2015-10-09 21:21 ` Trond Myklebust 2015-10-12 10:41 ` Kosuke Tatsukawa -1 siblings, 2 replies; 28+ messages in thread From: J. Bruce Fields @ 2015-10-09 21:18 UTC (permalink / raw) To: Kosuke Tatsukawa Cc: Neil Brown, Trond Myklebust, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs, netdev, linux-kernel On Fri, Oct 09, 2015 at 06:29:44AM +0000, Kosuke Tatsukawa wrote: > Neil Brown wrote: > > Kosuke Tatsukawa <tatsu@ab.jp.nec.com> writes: > > > >> There are several places in net/sunrpc/svcsock.c which calls > >> waitqueue_active() without calling a memory barrier. Add a memory > >> barrier just as in wq_has_sleeper(). > >> > >> I found this issue when I was looking through the linux source code > >> for places calling waitqueue_active() before wake_up*(), but without > >> preceding memory barriers, after sending a patch to fix a similar > >> issue in drivers/tty/n_tty.c (Details about the original issue can be > >> found here: https://lkml.org/lkml/2015/9/28/849). > > > > hi, > > this feels like the wrong approach to the problem. It requires extra > > 'smb_mb's to be spread around which are hard to understand as easy to > > forget. > > > > A quick look seems to suggest that (nearly) every waitqueue_active() > > will need an smb_mb. Could we just put the smb_mb() inside > > waitqueue_active()?? > <snip> > > There are around 200 occurrences of waitqueue_active() in the kernel > source, and most of the places which use it before wake_up are either > protected by some spin lock, or already has a memory barrier or some > kind of atomic operation before it. > > Simply adding smp_mb() to waitqueue_active() would incur extra cost in > many cases and won't be a good idea. > > Another way to solve this problem is to remove the waitqueue_active(), > making the code look like this; > if (wq) > wake_up_interruptible(wq); > This also fixes the problem because the spinlock in the wake_up*() acts > as a memory barrier and prevents the code from being reordered by the > CPU (and it also makes the resulting code is much simpler). I might not care which we did, except I don't have the means to test this quickly, and I guess this is some of our most frequently called code. I suppose your patch is the most conservative approach, as the alternative is a spinlock/unlock in wake_up_interruptible, which I assume is necessarily more expensive than an smp_mb(). As far as I can tell it's been this way since forever. (Well, since a 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which removed some spinlocks from the data_ready routines.) I don't understand what the actual race is yet (which code exactly is missing the wakeup in this case? nfsd threads seem to instead get woken up by the wake_up_process() in svc_xprt_do_enqueue().) --b. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc 2015-10-09 21:18 ` J. Bruce Fields @ 2015-10-09 21:21 ` Trond Myklebust 2015-10-12 10:41 ` Kosuke Tatsukawa 1 sibling, 0 replies; 28+ messages in thread From: Trond Myklebust @ 2015-10-09 21:21 UTC (permalink / raw) To: J. Bruce Fields Cc: Kosuke Tatsukawa, Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs, netdev, linux-kernel On Fri, Oct 9, 2015 at 5:18 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > On Fri, Oct 09, 2015 at 06:29:44AM +0000, Kosuke Tatsukawa wrote: > > Neil Brown wrote: > > > Kosuke Tatsukawa <tatsu@ab.jp.nec.com> writes: > > > > > >> There are several places in net/sunrpc/svcsock.c which calls > > >> waitqueue_active() without calling a memory barrier. Add a memory > > >> barrier just as in wq_has_sleeper(). > > >> > > >> I found this issue when I was looking through the linux source code > > >> for places calling waitqueue_active() before wake_up*(), but without > > >> preceding memory barriers, after sending a patch to fix a similar > > >> issue in drivers/tty/n_tty.c (Details about the original issue can be > > >> found here: https://lkml.org/lkml/2015/9/28/849). > > > > > > hi, > > > this feels like the wrong approach to the problem. It requires extra > > > 'smb_mb's to be spread around which are hard to understand as easy to > > > forget. > > > > > > A quick look seems to suggest that (nearly) every waitqueue_active() > > > will need an smb_mb. Could we just put the smb_mb() inside > > > waitqueue_active()?? > > <snip> > > > > There are around 200 occurrences of waitqueue_active() in the kernel > > source, and most of the places which use it before wake_up are either > > protected by some spin lock, or already has a memory barrier or some > > kind of atomic operation before it. > > > > Simply adding smp_mb() to waitqueue_active() would incur extra cost in > > many cases and won't be a good idea. > > > > Another way to solve this problem is to remove the waitqueue_active(), > > making the code look like this; > > if (wq) > > wake_up_interruptible(wq); > > This also fixes the problem because the spinlock in the wake_up*() acts > > as a memory barrier and prevents the code from being reordered by the > > CPU (and it also makes the resulting code is much simpler). > > I might not care which we did, except I don't have the means to test > this quickly, and I guess this is some of our most frequently called > code. > > I suppose your patch is the most conservative approach, as the > alternative is a spinlock/unlock in wake_up_interruptible, which I > assume is necessarily more expensive than an smp_mb(). > > As far as I can tell it's been this way since forever. (Well, since a > 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which > removed some spinlocks from the data_ready routines.) > > I don't understand what the actual race is yet (which code exactly is > missing the wakeup in this case? nfsd threads seem to instead get > woken up by the wake_up_process() in svc_xprt_do_enqueue().) > Those threads still use blocking calls for sendpage() and sendmsg(), so presumably they may be affected. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-12 10:41 ` Kosuke Tatsukawa 0 siblings, 0 replies; 28+ messages in thread From: Kosuke Tatsukawa @ 2015-10-12 10:41 UTC (permalink / raw) To: J. Bruce Fields Cc: Neil Brown, Trond Myklebust, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs, netdev, linux-kernel J. Bruce Fields wrote: > On Fri, Oct 09, 2015 at 06:29:44AM +0000, Kosuke Tatsukawa wrote: >> Neil Brown wrote: >> > Kosuke Tatsukawa <tatsu@ab.jp.nec.com> writes: >> > >> >> There are several places in net/sunrpc/svcsock.c which calls >> >> waitqueue_active() without calling a memory barrier. Add a memory >> >> barrier just as in wq_has_sleeper(). >> >> >> >> I found this issue when I was looking through the linux source code >> >> for places calling waitqueue_active() before wake_up*(), but without >> >> preceding memory barriers, after sending a patch to fix a similar >> >> issue in drivers/tty/n_tty.c (Details about the original issue can be >> >> found here: https://lkml.org/lkml/2015/9/28/849). >> > >> > hi, >> > this feels like the wrong approach to the problem. It requires extra >> > 'smb_mb's to be spread around which are hard to understand as easy to >> > forget. >> > >> > A quick look seems to suggest that (nearly) every waitqueue_active() >> > will need an smb_mb. Could we just put the smb_mb() inside >> > waitqueue_active()?? >> <snip> >> >> There are around 200 occurrences of waitqueue_active() in the kernel >> source, and most of the places which use it before wake_up are either >> protected by some spin lock, or already has a memory barrier or some >> kind of atomic operation before it. >> >> Simply adding smp_mb() to waitqueue_active() would incur extra cost in >> many cases and won't be a good idea. >> >> Another way to solve this problem is to remove the waitqueue_active(), >> making the code look like this; >> if (wq) >> wake_up_interruptible(wq); >> This also fixes the problem because the spinlock in the wake_up*() acts >> as a memory barrier and prevents the code from being reordered by the >> CPU (and it also makes the resulting code is much simpler). > > I might not care which we did, except I don't have the means to test > this quickly, and I guess this is some of our most frequently called > code. > > I suppose your patch is the most conservative approach, as the > alternative is a spinlock/unlock in wake_up_interruptible, which I > assume is necessarily more expensive than an smp_mb(). > > As far as I can tell it's been this way since forever. (Well, since a > 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which > removed some spinlocks from the data_ready routines.) > > I don't understand what the actual race is yet (which code exactly is > missing the wakeup in this case? nfsd threads seem to instead get > woken up by the wake_up_process() in svc_xprt_do_enqueue().) Thank you for the reply. I tried looking into this. The callbacks in net/sunrpc/svcsock.c are set up in svc_tcp_init() and svc_udp_init(), which are both called from svc_setup_socket(). svc_setup_socket() is called (indirectly) from lockd, nfsd, and nfsv4 callback port related code. Maybe I'm wrong, but there might not be any kernel code that is using the socket's wait queue in this case. Best regards. --- Kosuke TATSUKAWA | 3rd IT Platform Department | IT Platform Division, NEC Corporation | tatsu@ab.jp.nec.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-12 10:41 ` Kosuke Tatsukawa 0 siblings, 0 replies; 28+ messages in thread From: Kosuke Tatsukawa @ 2015-10-12 10:41 UTC (permalink / raw) To: J. Bruce Fields Cc: Neil Brown, Trond Myklebust, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA J. Bruce Fields wrote: > On Fri, Oct 09, 2015 at 06:29:44AM +0000, Kosuke Tatsukawa wrote: >> Neil Brown wrote: >> > Kosuke Tatsukawa <tatsu-zZGIbrA41Td8UrSeD/g0lQ@public.gmane.org> writes: >> > >> >> There are several places in net/sunrpc/svcsock.c which calls >> >> waitqueue_active() without calling a memory barrier. Add a memory >> >> barrier just as in wq_has_sleeper(). >> >> >> >> I found this issue when I was looking through the linux source code >> >> for places calling waitqueue_active() before wake_up*(), but without >> >> preceding memory barriers, after sending a patch to fix a similar >> >> issue in drivers/tty/n_tty.c (Details about the original issue can be >> >> found here: https://lkml.org/lkml/2015/9/28/849). >> > >> > hi, >> > this feels like the wrong approach to the problem. It requires extra >> > 'smb_mb's to be spread around which are hard to understand as easy to >> > forget. >> > >> > A quick look seems to suggest that (nearly) every waitqueue_active() >> > will need an smb_mb. Could we just put the smb_mb() inside >> > waitqueue_active()?? >> <snip> >> >> There are around 200 occurrences of waitqueue_active() in the kernel >> source, and most of the places which use it before wake_up are either >> protected by some spin lock, or already has a memory barrier or some >> kind of atomic operation before it. >> >> Simply adding smp_mb() to waitqueue_active() would incur extra cost in >> many cases and won't be a good idea. >> >> Another way to solve this problem is to remove the waitqueue_active(), >> making the code look like this; >> if (wq) >> wake_up_interruptible(wq); >> This also fixes the problem because the spinlock in the wake_up*() acts >> as a memory barrier and prevents the code from being reordered by the >> CPU (and it also makes the resulting code is much simpler). > > I might not care which we did, except I don't have the means to test > this quickly, and I guess this is some of our most frequently called > code. > > I suppose your patch is the most conservative approach, as the > alternative is a spinlock/unlock in wake_up_interruptible, which I > assume is necessarily more expensive than an smp_mb(). > > As far as I can tell it's been this way since forever. (Well, since a > 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which > removed some spinlocks from the data_ready routines.) > > I don't understand what the actual race is yet (which code exactly is > missing the wakeup in this case? nfsd threads seem to instead get > woken up by the wake_up_process() in svc_xprt_do_enqueue().) Thank you for the reply. I tried looking into this. The callbacks in net/sunrpc/svcsock.c are set up in svc_tcp_init() and svc_udp_init(), which are both called from svc_setup_socket(). svc_setup_socket() is called (indirectly) from lockd, nfsd, and nfsv4 callback port related code. Maybe I'm wrong, but there might not be any kernel code that is using the socket's wait queue in this case. Best regards. --- Kosuke TATSUKAWA | 3rd IT Platform Department | IT Platform Division, NEC Corporation | tatsu-zZGIbrA41Td8UrSeD/g0lQ@public.gmane.org -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc 2015-10-12 10:41 ` Kosuke Tatsukawa (?) @ 2015-10-12 20:26 ` J. Bruce Fields 2015-10-14 3:57 ` Kosuke Tatsukawa -1 siblings, 1 reply; 28+ messages in thread From: J. Bruce Fields @ 2015-10-12 20:26 UTC (permalink / raw) To: Kosuke Tatsukawa Cc: Neil Brown, Trond Myklebust, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs, netdev, linux-kernel On Mon, Oct 12, 2015 at 10:41:06AM +0000, Kosuke Tatsukawa wrote: > J. Bruce Fields wrote: > > On Fri, Oct 09, 2015 at 06:29:44AM +0000, Kosuke Tatsukawa wrote: > >> Neil Brown wrote: > >> > Kosuke Tatsukawa <tatsu@ab.jp.nec.com> writes: > >> > > >> >> There are several places in net/sunrpc/svcsock.c which calls > >> >> waitqueue_active() without calling a memory barrier. Add a memory > >> >> barrier just as in wq_has_sleeper(). > >> >> > >> >> I found this issue when I was looking through the linux source code > >> >> for places calling waitqueue_active() before wake_up*(), but without > >> >> preceding memory barriers, after sending a patch to fix a similar > >> >> issue in drivers/tty/n_tty.c (Details about the original issue can be > >> >> found here: https://lkml.org/lkml/2015/9/28/849). > >> > > >> > hi, > >> > this feels like the wrong approach to the problem. It requires extra > >> > 'smb_mb's to be spread around which are hard to understand as easy to > >> > forget. > >> > > >> > A quick look seems to suggest that (nearly) every waitqueue_active() > >> > will need an smb_mb. Could we just put the smb_mb() inside > >> > waitqueue_active()?? > >> <snip> > >> > >> There are around 200 occurrences of waitqueue_active() in the kernel > >> source, and most of the places which use it before wake_up are either > >> protected by some spin lock, or already has a memory barrier or some > >> kind of atomic operation before it. > >> > >> Simply adding smp_mb() to waitqueue_active() would incur extra cost in > >> many cases and won't be a good idea. > >> > >> Another way to solve this problem is to remove the waitqueue_active(), > >> making the code look like this; > >> if (wq) > >> wake_up_interruptible(wq); > >> This also fixes the problem because the spinlock in the wake_up*() acts > >> as a memory barrier and prevents the code from being reordered by the > >> CPU (and it also makes the resulting code is much simpler). > > > > I might not care which we did, except I don't have the means to test > > this quickly, and I guess this is some of our most frequently called > > code. > > > > I suppose your patch is the most conservative approach, as the > > alternative is a spinlock/unlock in wake_up_interruptible, which I > > assume is necessarily more expensive than an smp_mb(). > > > > As far as I can tell it's been this way since forever. (Well, since a > > 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which > > removed some spinlocks from the data_ready routines.) > > > > I don't understand what the actual race is yet (which code exactly is > > missing the wakeup in this case? nfsd threads seem to instead get > > woken up by the wake_up_process() in svc_xprt_do_enqueue().) > > Thank you for the reply. I tried looking into this. > > The callbacks in net/sunrpc/svcsock.c are set up in svc_tcp_init() and > svc_udp_init(), which are both called from svc_setup_socket(). > svc_setup_socket() is called (indirectly) from lockd, nfsd, and nfsv4 > callback port related code. > > Maybe I'm wrong, but there might not be any kernel code that is using > the socket's wait queue in this case. As Trond points out there are probably waiters internal to the networking code. --b. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc 2015-10-12 20:26 ` J. Bruce Fields @ 2015-10-14 3:57 ` Kosuke Tatsukawa 2015-10-14 16:00 ` J. Bruce Fields 0 siblings, 1 reply; 28+ messages in thread From: Kosuke Tatsukawa @ 2015-10-14 3:57 UTC (permalink / raw) To: Trond Myklebust, J. Bruce Fields Cc: Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs, netdev, linux-kernel J. Bruce Fields wrote: > On Mon, Oct 12, 2015 at 10:41:06AM +0000, Kosuke Tatsukawa wrote: >> J. Bruce Fields wrote: >> > On Fri, Oct 09, 2015 at 06:29:44AM +0000, Kosuke Tatsukawa wrote: >> >> Neil Brown wrote: >> >> > Kosuke Tatsukawa <tatsu@ab.jp.nec.com> writes: >> >> > >> >> >> There are several places in net/sunrpc/svcsock.c which calls >> >> >> waitqueue_active() without calling a memory barrier. Add a memory >> >> >> barrier just as in wq_has_sleeper(). >> >> >> >> >> >> I found this issue when I was looking through the linux source code >> >> >> for places calling waitqueue_active() before wake_up*(), but without >> >> >> preceding memory barriers, after sending a patch to fix a similar >> >> >> issue in drivers/tty/n_tty.c (Details about the original issue can be >> >> >> found here: https://lkml.org/lkml/2015/9/28/849). >> >> > >> >> > hi, >> >> > this feels like the wrong approach to the problem. It requires extra >> >> > 'smb_mb's to be spread around which are hard to understand as easy to >> >> > forget. >> >> > >> >> > A quick look seems to suggest that (nearly) every waitqueue_active() >> >> > will need an smb_mb. Could we just put the smb_mb() inside >> >> > waitqueue_active()?? >> >> <snip> >> >> >> >> There are around 200 occurrences of waitqueue_active() in the kernel >> >> source, and most of the places which use it before wake_up are either >> >> protected by some spin lock, or already has a memory barrier or some >> >> kind of atomic operation before it. >> >> >> >> Simply adding smp_mb() to waitqueue_active() would incur extra cost in >> >> many cases and won't be a good idea. >> >> >> >> Another way to solve this problem is to remove the waitqueue_active(), >> >> making the code look like this; >> >> if (wq) >> >> wake_up_interruptible(wq); >> >> This also fixes the problem because the spinlock in the wake_up*() acts >> >> as a memory barrier and prevents the code from being reordered by the >> >> CPU (and it also makes the resulting code is much simpler). >> > >> > I might not care which we did, except I don't have the means to test >> > this quickly, and I guess this is some of our most frequently called >> > code. >> > >> > I suppose your patch is the most conservative approach, as the >> > alternative is a spinlock/unlock in wake_up_interruptible, which I >> > assume is necessarily more expensive than an smp_mb(). >> > >> > As far as I can tell it's been this way since forever. (Well, since a >> > 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which >> > removed some spinlocks from the data_ready routines.) >> > >> > I don't understand what the actual race is yet (which code exactly is >> > missing the wakeup in this case? nfsd threads seem to instead get >> > woken up by the wake_up_process() in svc_xprt_do_enqueue().) >> >> Thank you for the reply. I tried looking into this. >> >> The callbacks in net/sunrpc/svcsock.c are set up in svc_tcp_init() and >> svc_udp_init(), which are both called from svc_setup_socket(). >> svc_setup_socket() is called (indirectly) from lockd, nfsd, and nfsv4 >> callback port related code. >> >> Maybe I'm wrong, but there might not be any kernel code that is using >> the socket's wait queue in this case. > > As Trond points out there are probably waiters internal to the > networking code. Trond and Bruce, thank you for the comment. I was able to find the call to the wait function that was called from nfsd. sk_stream_wait_connect() and sk_stream_wait_memory() were called from either do_tcp_sendpages() or tcp_sendmsg() called from within svc_send(). sk_stream_wait_connect() shouldn't be called at this point, because the socket has already been used to receive the rpc request. On the wake_up side, sk_write_space() is called from the following locations. The relevant ones seems to be preceded by atomic_sub or a memory barrier. + ksocknal_write_space [drivers/staging/lustre/lnet/klnds/socklnd/socklnd_lib.c:633] + atm_pop_raw [net/atm/raw.c:40] + sock_setsockopt [net/core/sock.c:740] + sock_wfree [net/core/sock.c:1630] Preceded by atomic_sub in sock_wfree() + ccid3_hc_tx_packet_recv [net/dccp/ccids/ccid3.c:442] + do_tcp_sendpages [net/ipv4/tcp.c:1008] + tcp_sendmsg [net/ipv4/tcp.c:1300] + do_tcp_setsockopt [net/ipv4/tcp.c:2597] + tcp_new_space [net/ipv4/tcp_input.c:4885] Preceded by smp_mb__after_atomic in tcp_check_space() + llc_conn_state_process [net/llc/llc_conn.c:148] + pipe_rcv_status [net/phonet/pep.c:312] + pipe_do_rcv [net/phonet/pep.c:440] + pipe_start_flow_control [net/phonet/pep.c:554] + svc_sock_setbufsize [net/sunrpc/svcsock.c:45] sk_state_change() calls related to TCP/IP were called from the following places. + inet_shutdown [net/ipv4/af_inet.c:825] This shouldn't be called when waiting + tcp_done [net/ipv4/tcp.c:3078] spin_lock*/spin_unlock* is called in lock_timer_base + tcp_fin [net/ipv4/tcp_input.c:4031] atomic_long_sub is called from sk_memory_allocated_sub called within sk_mem_reclaim + tcp_finish_connect [net/ipv4/tcp_input.c:5415] This shoudn't be called when waiting + tcp_rcv_state_process [net/ipv4/tcp_input.c:5807,5880] The socket shouldn't be in TCP_SYN_RECV nor TCP_FIN_WAIT1 states when waiting I think the wait queue won't be used for being woken up by svc_{tcp,udp}_data_ready, because nfsd doesn't read from a socket. So with the current implementation, it seems there shouldn't be any problems even if the memory barrier is missing. --- Kosuke TATSUKAWA | 3rd IT Platform Department | IT Platform Division, NEC Corporation | tatsu@ab.jp.nec.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-14 16:00 ` J. Bruce Fields 0 siblings, 0 replies; 28+ messages in thread From: J. Bruce Fields @ 2015-10-14 16:00 UTC (permalink / raw) To: Kosuke Tatsukawa Cc: Trond Myklebust, Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs, netdev, linux-kernel On Wed, Oct 14, 2015 at 03:57:13AM +0000, Kosuke Tatsukawa wrote: > J. Bruce Fields wrote: > > On Mon, Oct 12, 2015 at 10:41:06AM +0000, Kosuke Tatsukawa wrote: > >> J. Bruce Fields wrote: > >> > On Fri, Oct 09, 2015 at 06:29:44AM +0000, Kosuke Tatsukawa wrote: > >> >> Neil Brown wrote: > >> >> > Kosuke Tatsukawa <tatsu@ab.jp.nec.com> writes: > >> >> > > >> >> >> There are several places in net/sunrpc/svcsock.c which calls > >> >> >> waitqueue_active() without calling a memory barrier. Add a memory > >> >> >> barrier just as in wq_has_sleeper(). > >> >> >> > >> >> >> I found this issue when I was looking through the linux source code > >> >> >> for places calling waitqueue_active() before wake_up*(), but without > >> >> >> preceding memory barriers, after sending a patch to fix a similar > >> >> >> issue in drivers/tty/n_tty.c (Details about the original issue can be > >> >> >> found here: https://lkml.org/lkml/2015/9/28/849). > >> >> > > >> >> > hi, > >> >> > this feels like the wrong approach to the problem. It requires extra > >> >> > 'smb_mb's to be spread around which are hard to understand as easy to > >> >> > forget. > >> >> > > >> >> > A quick look seems to suggest that (nearly) every waitqueue_active() > >> >> > will need an smb_mb. Could we just put the smb_mb() inside > >> >> > waitqueue_active()?? > >> >> <snip> > >> >> > >> >> There are around 200 occurrences of waitqueue_active() in the kernel > >> >> source, and most of the places which use it before wake_up are either > >> >> protected by some spin lock, or already has a memory barrier or some > >> >> kind of atomic operation before it. > >> >> > >> >> Simply adding smp_mb() to waitqueue_active() would incur extra cost in > >> >> many cases and won't be a good idea. > >> >> > >> >> Another way to solve this problem is to remove the waitqueue_active(), > >> >> making the code look like this; > >> >> if (wq) > >> >> wake_up_interruptible(wq); > >> >> This also fixes the problem because the spinlock in the wake_up*() acts > >> >> as a memory barrier and prevents the code from being reordered by the > >> >> CPU (and it also makes the resulting code is much simpler). > >> > > >> > I might not care which we did, except I don't have the means to test > >> > this quickly, and I guess this is some of our most frequently called > >> > code. > >> > > >> > I suppose your patch is the most conservative approach, as the > >> > alternative is a spinlock/unlock in wake_up_interruptible, which I > >> > assume is necessarily more expensive than an smp_mb(). > >> > > >> > As far as I can tell it's been this way since forever. (Well, since a > >> > 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which > >> > removed some spinlocks from the data_ready routines.) > >> > > >> > I don't understand what the actual race is yet (which code exactly is > >> > missing the wakeup in this case? nfsd threads seem to instead get > >> > woken up by the wake_up_process() in svc_xprt_do_enqueue().) > >> > >> Thank you for the reply. I tried looking into this. > >> > >> The callbacks in net/sunrpc/svcsock.c are set up in svc_tcp_init() and > >> svc_udp_init(), which are both called from svc_setup_socket(). > >> svc_setup_socket() is called (indirectly) from lockd, nfsd, and nfsv4 > >> callback port related code. > >> > >> Maybe I'm wrong, but there might not be any kernel code that is using > >> the socket's wait queue in this case. > > > > As Trond points out there are probably waiters internal to the > > networking code. > > Trond and Bruce, thank you for the comment. I was able to find the call > to the wait function that was called from nfsd. > > sk_stream_wait_connect() and sk_stream_wait_memory() were called from > either do_tcp_sendpages() or tcp_sendmsg() called from within > svc_send(). sk_stream_wait_connect() shouldn't be called at this point, > because the socket has already been used to receive the rpc request. > > On the wake_up side, sk_write_space() is called from the following > locations. The relevant ones seems to be preceded by atomic_sub or a > memory barrier. > + ksocknal_write_space [drivers/staging/lustre/lnet/klnds/socklnd/socklnd_lib.c:633] > + atm_pop_raw [net/atm/raw.c:40] > + sock_setsockopt [net/core/sock.c:740] > + sock_wfree [net/core/sock.c:1630] > Preceded by atomic_sub in sock_wfree() > + ccid3_hc_tx_packet_recv [net/dccp/ccids/ccid3.c:442] > + do_tcp_sendpages [net/ipv4/tcp.c:1008] > + tcp_sendmsg [net/ipv4/tcp.c:1300] > + do_tcp_setsockopt [net/ipv4/tcp.c:2597] > + tcp_new_space [net/ipv4/tcp_input.c:4885] > Preceded by smp_mb__after_atomic in tcp_check_space() > + llc_conn_state_process [net/llc/llc_conn.c:148] > + pipe_rcv_status [net/phonet/pep.c:312] > + pipe_do_rcv [net/phonet/pep.c:440] > + pipe_start_flow_control [net/phonet/pep.c:554] > + svc_sock_setbufsize [net/sunrpc/svcsock.c:45] > > sk_state_change() calls related to TCP/IP were called from the following > places. > + inet_shutdown [net/ipv4/af_inet.c:825] > This shouldn't be called when waiting > + tcp_done [net/ipv4/tcp.c:3078] > spin_lock*/spin_unlock* is called in lock_timer_base > + tcp_fin [net/ipv4/tcp_input.c:4031] > atomic_long_sub is called from sk_memory_allocated_sub called within > sk_mem_reclaim > + tcp_finish_connect [net/ipv4/tcp_input.c:5415] > This shoudn't be called when waiting > + tcp_rcv_state_process [net/ipv4/tcp_input.c:5807,5880] > The socket shouldn't be in TCP_SYN_RECV nor TCP_FIN_WAIT1 states when > waiting > > I think the wait queue won't be used for being woken up by > svc_{tcp,udp}_data_ready, because nfsd doesn't read from a socket. Looking, well, I guess kernel_recvmsg() does read from a socket, but I assume calling with MSG_DONTWAIT means that it doesn't block. > So with the current implementation, it seems there shouldn't be any > problems even if the memory barrier is missing. Thanks for the detailed investigation. I think it would be worth adding a comment if that might help someone having to reinvestigate this again some day. --b. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-14 16:00 ` J. Bruce Fields 0 siblings, 0 replies; 28+ messages in thread From: J. Bruce Fields @ 2015-10-14 16:00 UTC (permalink / raw) To: Kosuke Tatsukawa Cc: Trond Myklebust, Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Wed, Oct 14, 2015 at 03:57:13AM +0000, Kosuke Tatsukawa wrote: > J. Bruce Fields wrote: > > On Mon, Oct 12, 2015 at 10:41:06AM +0000, Kosuke Tatsukawa wrote: > >> J. Bruce Fields wrote: > >> > On Fri, Oct 09, 2015 at 06:29:44AM +0000, Kosuke Tatsukawa wrote: > >> >> Neil Brown wrote: > >> >> > Kosuke Tatsukawa <tatsu-zZGIbrA41Td8UrSeD/g0lQ@public.gmane.org> writes: > >> >> > > >> >> >> There are several places in net/sunrpc/svcsock.c which calls > >> >> >> waitqueue_active() without calling a memory barrier. Add a memory > >> >> >> barrier just as in wq_has_sleeper(). > >> >> >> > >> >> >> I found this issue when I was looking through the linux source code > >> >> >> for places calling waitqueue_active() before wake_up*(), but without > >> >> >> preceding memory barriers, after sending a patch to fix a similar > >> >> >> issue in drivers/tty/n_tty.c (Details about the original issue can be > >> >> >> found here: https://lkml.org/lkml/2015/9/28/849). > >> >> > > >> >> > hi, > >> >> > this feels like the wrong approach to the problem. It requires extra > >> >> > 'smb_mb's to be spread around which are hard to understand as easy to > >> >> > forget. > >> >> > > >> >> > A quick look seems to suggest that (nearly) every waitqueue_active() > >> >> > will need an smb_mb. Could we just put the smb_mb() inside > >> >> > waitqueue_active()?? > >> >> <snip> > >> >> > >> >> There are around 200 occurrences of waitqueue_active() in the kernel > >> >> source, and most of the places which use it before wake_up are either > >> >> protected by some spin lock, or already has a memory barrier or some > >> >> kind of atomic operation before it. > >> >> > >> >> Simply adding smp_mb() to waitqueue_active() would incur extra cost in > >> >> many cases and won't be a good idea. > >> >> > >> >> Another way to solve this problem is to remove the waitqueue_active(), > >> >> making the code look like this; > >> >> if (wq) > >> >> wake_up_interruptible(wq); > >> >> This also fixes the problem because the spinlock in the wake_up*() acts > >> >> as a memory barrier and prevents the code from being reordered by the > >> >> CPU (and it also makes the resulting code is much simpler). > >> > > >> > I might not care which we did, except I don't have the means to test > >> > this quickly, and I guess this is some of our most frequently called > >> > code. > >> > > >> > I suppose your patch is the most conservative approach, as the > >> > alternative is a spinlock/unlock in wake_up_interruptible, which I > >> > assume is necessarily more expensive than an smp_mb(). > >> > > >> > As far as I can tell it's been this way since forever. (Well, since a > >> > 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which > >> > removed some spinlocks from the data_ready routines.) > >> > > >> > I don't understand what the actual race is yet (which code exactly is > >> > missing the wakeup in this case? nfsd threads seem to instead get > >> > woken up by the wake_up_process() in svc_xprt_do_enqueue().) > >> > >> Thank you for the reply. I tried looking into this. > >> > >> The callbacks in net/sunrpc/svcsock.c are set up in svc_tcp_init() and > >> svc_udp_init(), which are both called from svc_setup_socket(). > >> svc_setup_socket() is called (indirectly) from lockd, nfsd, and nfsv4 > >> callback port related code. > >> > >> Maybe I'm wrong, but there might not be any kernel code that is using > >> the socket's wait queue in this case. > > > > As Trond points out there are probably waiters internal to the > > networking code. > > Trond and Bruce, thank you for the comment. I was able to find the call > to the wait function that was called from nfsd. > > sk_stream_wait_connect() and sk_stream_wait_memory() were called from > either do_tcp_sendpages() or tcp_sendmsg() called from within > svc_send(). sk_stream_wait_connect() shouldn't be called at this point, > because the socket has already been used to receive the rpc request. > > On the wake_up side, sk_write_space() is called from the following > locations. The relevant ones seems to be preceded by atomic_sub or a > memory barrier. > + ksocknal_write_space [drivers/staging/lustre/lnet/klnds/socklnd/socklnd_lib.c:633] > + atm_pop_raw [net/atm/raw.c:40] > + sock_setsockopt [net/core/sock.c:740] > + sock_wfree [net/core/sock.c:1630] > Preceded by atomic_sub in sock_wfree() > + ccid3_hc_tx_packet_recv [net/dccp/ccids/ccid3.c:442] > + do_tcp_sendpages [net/ipv4/tcp.c:1008] > + tcp_sendmsg [net/ipv4/tcp.c:1300] > + do_tcp_setsockopt [net/ipv4/tcp.c:2597] > + tcp_new_space [net/ipv4/tcp_input.c:4885] > Preceded by smp_mb__after_atomic in tcp_check_space() > + llc_conn_state_process [net/llc/llc_conn.c:148] > + pipe_rcv_status [net/phonet/pep.c:312] > + pipe_do_rcv [net/phonet/pep.c:440] > + pipe_start_flow_control [net/phonet/pep.c:554] > + svc_sock_setbufsize [net/sunrpc/svcsock.c:45] > > sk_state_change() calls related to TCP/IP were called from the following > places. > + inet_shutdown [net/ipv4/af_inet.c:825] > This shouldn't be called when waiting > + tcp_done [net/ipv4/tcp.c:3078] > spin_lock*/spin_unlock* is called in lock_timer_base > + tcp_fin [net/ipv4/tcp_input.c:4031] > atomic_long_sub is called from sk_memory_allocated_sub called within > sk_mem_reclaim > + tcp_finish_connect [net/ipv4/tcp_input.c:5415] > This shoudn't be called when waiting > + tcp_rcv_state_process [net/ipv4/tcp_input.c:5807,5880] > The socket shouldn't be in TCP_SYN_RECV nor TCP_FIN_WAIT1 states when > waiting > > I think the wait queue won't be used for being woken up by > svc_{tcp,udp}_data_ready, because nfsd doesn't read from a socket. Looking, well, I guess kernel_recvmsg() does read from a socket, but I assume calling with MSG_DONTWAIT means that it doesn't block. > So with the current implementation, it seems there shouldn't be any > problems even if the memory barrier is missing. Thanks for the detailed investigation. I think it would be worth adding a comment if that might help someone having to reinvestigate this again some day. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc 2015-10-14 16:00 ` J. Bruce Fields (?) @ 2015-10-15 0:09 ` Kosuke Tatsukawa 2015-10-15 11:44 ` Kosuke Tatsukawa -1 siblings, 1 reply; 28+ messages in thread From: Kosuke Tatsukawa @ 2015-10-15 0:09 UTC (permalink / raw) To: J. Bruce Fields Cc: Trond Myklebust, Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs, netdev, linux-kernel J. Bruce Fields wrote: > On Wed, Oct 14, 2015 at 03:57:13AM +0000, Kosuke Tatsukawa wrote: >> J. Bruce Fields wrote: >> > On Mon, Oct 12, 2015 at 10:41:06AM +0000, Kosuke Tatsukawa wrote: >> >> J. Bruce Fields wrote: >> >> > On Fri, Oct 09, 2015 at 06:29:44AM +0000, Kosuke Tatsukawa wrote: >> >> >> Neil Brown wrote: >> >> >> > Kosuke Tatsukawa <tatsu@ab.jp.nec.com> writes: >> >> >> > >> >> >> >> There are several places in net/sunrpc/svcsock.c which calls >> >> >> >> waitqueue_active() without calling a memory barrier. Add a memory >> >> >> >> barrier just as in wq_has_sleeper(). >> >> >> >> >> >> >> >> I found this issue when I was looking through the linux source code >> >> >> >> for places calling waitqueue_active() before wake_up*(), but without >> >> >> >> preceding memory barriers, after sending a patch to fix a similar >> >> >> >> issue in drivers/tty/n_tty.c (Details about the original issue can be >> >> >> >> found here: https://lkml.org/lkml/2015/9/28/849). >> >> >> > >> >> >> > hi, >> >> >> > this feels like the wrong approach to the problem. It requires extra >> >> >> > 'smb_mb's to be spread around which are hard to understand as easy to >> >> >> > forget. >> >> >> > >> >> >> > A quick look seems to suggest that (nearly) every waitqueue_active() >> >> >> > will need an smb_mb. Could we just put the smb_mb() inside >> >> >> > waitqueue_active()?? >> >> >> <snip> >> >> >> >> >> >> There are around 200 occurrences of waitqueue_active() in the kernel >> >> >> source, and most of the places which use it before wake_up are either >> >> >> protected by some spin lock, or already has a memory barrier or some >> >> >> kind of atomic operation before it. >> >> >> >> >> >> Simply adding smp_mb() to waitqueue_active() would incur extra cost in >> >> >> many cases and won't be a good idea. >> >> >> >> >> >> Another way to solve this problem is to remove the waitqueue_active(), >> >> >> making the code look like this; >> >> >> if (wq) >> >> >> wake_up_interruptible(wq); >> >> >> This also fixes the problem because the spinlock in the wake_up*() acts >> >> >> as a memory barrier and prevents the code from being reordered by the >> >> >> CPU (and it also makes the resulting code is much simpler). >> >> > >> >> > I might not care which we did, except I don't have the means to test >> >> > this quickly, and I guess this is some of our most frequently called >> >> > code. >> >> > >> >> > I suppose your patch is the most conservative approach, as the >> >> > alternative is a spinlock/unlock in wake_up_interruptible, which I >> >> > assume is necessarily more expensive than an smp_mb(). >> >> > >> >> > As far as I can tell it's been this way since forever. (Well, since a >> >> > 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which >> >> > removed some spinlocks from the data_ready routines.) >> >> > >> >> > I don't understand what the actual race is yet (which code exactly is >> >> > missing the wakeup in this case? nfsd threads seem to instead get >> >> > woken up by the wake_up_process() in svc_xprt_do_enqueue().) >> >> >> >> Thank you for the reply. I tried looking into this. >> >> >> >> The callbacks in net/sunrpc/svcsock.c are set up in svc_tcp_init() and >> >> svc_udp_init(), which are both called from svc_setup_socket(). >> >> svc_setup_socket() is called (indirectly) from lockd, nfsd, and nfsv4 >> >> callback port related code. >> >> >> >> Maybe I'm wrong, but there might not be any kernel code that is using >> >> the socket's wait queue in this case. >> > >> > As Trond points out there are probably waiters internal to the >> > networking code. >> >> Trond and Bruce, thank you for the comment. I was able to find the call >> to the wait function that was called from nfsd. >> >> sk_stream_wait_connect() and sk_stream_wait_memory() were called from >> either do_tcp_sendpages() or tcp_sendmsg() called from within >> svc_send(). sk_stream_wait_connect() shouldn't be called at this point, >> because the socket has already been used to receive the rpc request. >> >> On the wake_up side, sk_write_space() is called from the following >> locations. The relevant ones seems to be preceded by atomic_sub or a >> memory barrier. >> + ksocknal_write_space [drivers/staging/lustre/lnet/klnds/socklnd/socklnd_lib.c:633] >> + atm_pop_raw [net/atm/raw.c:40] >> + sock_setsockopt [net/core/sock.c:740] >> + sock_wfree [net/core/sock.c:1630] >> Preceded by atomic_sub in sock_wfree() >> + ccid3_hc_tx_packet_recv [net/dccp/ccids/ccid3.c:442] >> + do_tcp_sendpages [net/ipv4/tcp.c:1008] >> + tcp_sendmsg [net/ipv4/tcp.c:1300] >> + do_tcp_setsockopt [net/ipv4/tcp.c:2597] >> + tcp_new_space [net/ipv4/tcp_input.c:4885] >> Preceded by smp_mb__after_atomic in tcp_check_space() >> + llc_conn_state_process [net/llc/llc_conn.c:148] >> + pipe_rcv_status [net/phonet/pep.c:312] >> + pipe_do_rcv [net/phonet/pep.c:440] >> + pipe_start_flow_control [net/phonet/pep.c:554] >> + svc_sock_setbufsize [net/sunrpc/svcsock.c:45] >> >> sk_state_change() calls related to TCP/IP were called from the following >> places. >> + inet_shutdown [net/ipv4/af_inet.c:825] >> This shouldn't be called when waiting >> + tcp_done [net/ipv4/tcp.c:3078] >> spin_lock*/spin_unlock* is called in lock_timer_base >> + tcp_fin [net/ipv4/tcp_input.c:4031] >> atomic_long_sub is called from sk_memory_allocated_sub called within >> sk_mem_reclaim >> + tcp_finish_connect [net/ipv4/tcp_input.c:5415] >> This shoudn't be called when waiting >> + tcp_rcv_state_process [net/ipv4/tcp_input.c:5807,5880] >> The socket shouldn't be in TCP_SYN_RECV nor TCP_FIN_WAIT1 states when >> waiting >> >> I think the wait queue won't be used for being woken up by >> svc_{tcp,udp}_data_ready, because nfsd doesn't read from a socket. > > Looking, well, I guess kernel_recvmsg() does read from a socket, but I > assume calling with MSG_DONTWAIT means that it doesn't block. > >> So with the current implementation, it seems there shouldn't be any >> problems even if the memory barrier is missing. > > Thanks for the detailed investigation. > > I think it would be worth adding a comment if that might help someone > having to reinvestigate this again some day. It would be nice, but I find it difficult to write a comment in the sunrpc layer why a memory barrier isn't necessary, using the knowledge of how nfsd uses it, and the current implementation of the network code. Personally, I would prefer removing the call to waitqueue_active() which would make the memory barrier totally unnecessary at the cost of a spin_lock + spin_unlock by unconditionally calling wake_up_interruptible. --- Kosuke TATSUKAWA | 3rd IT Platform Department | IT Platform Division, NEC Corporation | tatsu@ab.jp.nec.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-15 11:44 ` Kosuke Tatsukawa 0 siblings, 0 replies; 28+ messages in thread From: Kosuke Tatsukawa @ 2015-10-15 11:44 UTC (permalink / raw) To: J. Bruce Fields Cc: Trond Myklebust, Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs, netdev, linux-kernel Tatsukawa Kosuke wrote: > J. Bruce Fields wrote: >> On Wed, Oct 14, 2015 at 03:57:13AM +0000, Kosuke Tatsukawa wrote: >>> J. Bruce Fields wrote: >>> > On Mon, Oct 12, 2015 at 10:41:06AM +0000, Kosuke Tatsukawa wrote: >>> >> J. Bruce Fields wrote: >>> >> > On Fri, Oct 09, 2015 at 06:29:44AM +0000, Kosuke Tatsukawa wrote: >>> >> >> Neil Brown wrote: >>> >> >> > Kosuke Tatsukawa <tatsu@ab.jp.nec.com> writes: >>> >> >> > >>> >> >> >> There are several places in net/sunrpc/svcsock.c which calls >>> >> >> >> waitqueue_active() without calling a memory barrier. Add a memory >>> >> >> >> barrier just as in wq_has_sleeper(). >>> >> >> >> >>> >> >> >> I found this issue when I was looking through the linux source code >>> >> >> >> for places calling waitqueue_active() before wake_up*(), but without >>> >> >> >> preceding memory barriers, after sending a patch to fix a similar >>> >> >> >> issue in drivers/tty/n_tty.c (Details about the original issue can be >>> >> >> >> found here: https://lkml.org/lkml/2015/9/28/849). >>> >> >> > >>> >> >> > hi, >>> >> >> > this feels like the wrong approach to the problem. It requires extra >>> >> >> > 'smb_mb's to be spread around which are hard to understand as easy to >>> >> >> > forget. >>> >> >> > >>> >> >> > A quick look seems to suggest that (nearly) every waitqueue_active() >>> >> >> > will need an smb_mb. Could we just put the smb_mb() inside >>> >> >> > waitqueue_active()?? >>> >> >> <snip> >>> >> >> >>> >> >> There are around 200 occurrences of waitqueue_active() in the kernel >>> >> >> source, and most of the places which use it before wake_up are either >>> >> >> protected by some spin lock, or already has a memory barrier or some >>> >> >> kind of atomic operation before it. >>> >> >> >>> >> >> Simply adding smp_mb() to waitqueue_active() would incur extra cost in >>> >> >> many cases and won't be a good idea. >>> >> >> >>> >> >> Another way to solve this problem is to remove the waitqueue_active(), >>> >> >> making the code look like this; >>> >> >> if (wq) >>> >> >> wake_up_interruptible(wq); >>> >> >> This also fixes the problem because the spinlock in the wake_up*() acts >>> >> >> as a memory barrier and prevents the code from being reordered by the >>> >> >> CPU (and it also makes the resulting code is much simpler). >>> >> > >>> >> > I might not care which we did, except I don't have the means to test >>> >> > this quickly, and I guess this is some of our most frequently called >>> >> > code. >>> >> > >>> >> > I suppose your patch is the most conservative approach, as the >>> >> > alternative is a spinlock/unlock in wake_up_interruptible, which I >>> >> > assume is necessarily more expensive than an smp_mb(). >>> >> > >>> >> > As far as I can tell it's been this way since forever. (Well, since a >>> >> > 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which >>> >> > removed some spinlocks from the data_ready routines.) >>> >> > >>> >> > I don't understand what the actual race is yet (which code exactly is >>> >> > missing the wakeup in this case? nfsd threads seem to instead get >>> >> > woken up by the wake_up_process() in svc_xprt_do_enqueue().) >>> >> >>> >> Thank you for the reply. I tried looking into this. >>> >> >>> >> The callbacks in net/sunrpc/svcsock.c are set up in svc_tcp_init() and >>> >> svc_udp_init(), which are both called from svc_setup_socket(). >>> >> svc_setup_socket() is called (indirectly) from lockd, nfsd, and nfsv4 >>> >> callback port related code. >>> >> >>> >> Maybe I'm wrong, but there might not be any kernel code that is using >>> >> the socket's wait queue in this case. >>> > >>> > As Trond points out there are probably waiters internal to the >>> > networking code. >>> >>> Trond and Bruce, thank you for the comment. I was able to find the call >>> to the wait function that was called from nfsd. >>> >>> sk_stream_wait_connect() and sk_stream_wait_memory() were called from >>> either do_tcp_sendpages() or tcp_sendmsg() called from within >>> svc_send(). sk_stream_wait_connect() shouldn't be called at this point, >>> because the socket has already been used to receive the rpc request. >>> >>> On the wake_up side, sk_write_space() is called from the following >>> locations. The relevant ones seems to be preceded by atomic_sub or a >>> memory barrier. >>> + ksocknal_write_space [drivers/staging/lustre/lnet/klnds/socklnd/socklnd_lib.c:633] >>> + atm_pop_raw [net/atm/raw.c:40] >>> + sock_setsockopt [net/core/sock.c:740] >>> + sock_wfree [net/core/sock.c:1630] >>> Preceded by atomic_sub in sock_wfree() >>> + ccid3_hc_tx_packet_recv [net/dccp/ccids/ccid3.c:442] >>> + do_tcp_sendpages [net/ipv4/tcp.c:1008] >>> + tcp_sendmsg [net/ipv4/tcp.c:1300] >>> + do_tcp_setsockopt [net/ipv4/tcp.c:2597] >>> + tcp_new_space [net/ipv4/tcp_input.c:4885] >>> Preceded by smp_mb__after_atomic in tcp_check_space() >>> + llc_conn_state_process [net/llc/llc_conn.c:148] >>> + pipe_rcv_status [net/phonet/pep.c:312] >>> + pipe_do_rcv [net/phonet/pep.c:440] >>> + pipe_start_flow_control [net/phonet/pep.c:554] >>> + svc_sock_setbufsize [net/sunrpc/svcsock.c:45] >>> >>> sk_state_change() calls related to TCP/IP were called from the following >>> places. >>> + inet_shutdown [net/ipv4/af_inet.c:825] >>> This shouldn't be called when waiting >>> + tcp_done [net/ipv4/tcp.c:3078] >>> spin_lock*/spin_unlock* is called in lock_timer_base >>> + tcp_fin [net/ipv4/tcp_input.c:4031] >>> atomic_long_sub is called from sk_memory_allocated_sub called within >>> sk_mem_reclaim >>> + tcp_finish_connect [net/ipv4/tcp_input.c:5415] >>> This shoudn't be called when waiting >>> + tcp_rcv_state_process [net/ipv4/tcp_input.c:5807,5880] >>> The socket shouldn't be in TCP_SYN_RECV nor TCP_FIN_WAIT1 states when >>> waiting >>> >>> I think the wait queue won't be used for being woken up by >>> svc_{tcp,udp}_data_ready, because nfsd doesn't read from a socket. >> >> Looking, well, I guess kernel_recvmsg() does read from a socket, but I >> assume calling with MSG_DONTWAIT means that it doesn't block. >> >>> So with the current implementation, it seems there shouldn't be any >>> problems even if the memory barrier is missing. >> >> Thanks for the detailed investigation. >> >> I think it would be worth adding a comment if that might help someone >> having to reinvestigate this again some day. > > It would be nice, but I find it difficult to write a comment in the > sunrpc layer why a memory barrier isn't necessary, using the knowledge > of how nfsd uses it, and the current implementation of the network code. > > Personally, I would prefer removing the call to waitqueue_active() which > would make the memory barrier totally unnecessary at the cost of a > spin_lock + spin_unlock by unconditionally calling > wake_up_interruptible. On second thought, the callbacks will be called frequently from the tcp code, so it wouldn't be a good idea. --- Kosuke TATSUKAWA | 3rd IT Platform Department | IT Platform Division, NEC Corporation | tatsu@ab.jp.nec.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-15 11:44 ` Kosuke Tatsukawa 0 siblings, 0 replies; 28+ messages in thread From: Kosuke Tatsukawa @ 2015-10-15 11:44 UTC (permalink / raw) To: J. Bruce Fields Cc: Trond Myklebust, Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Tatsukawa Kosuke wrote: > J. Bruce Fields wrote: >> On Wed, Oct 14, 2015 at 03:57:13AM +0000, Kosuke Tatsukawa wrote: >>> J. Bruce Fields wrote: >>> > On Mon, Oct 12, 2015 at 10:41:06AM +0000, Kosuke Tatsukawa wrote: >>> >> J. Bruce Fields wrote: >>> >> > On Fri, Oct 09, 2015 at 06:29:44AM +0000, Kosuke Tatsukawa wrote: >>> >> >> Neil Brown wrote: >>> >> >> > Kosuke Tatsukawa <tatsu-zZGIbrA41Td8UrSeD/g0lQ@public.gmane.org> writes: >>> >> >> > >>> >> >> >> There are several places in net/sunrpc/svcsock.c which calls >>> >> >> >> waitqueue_active() without calling a memory barrier. Add a memory >>> >> >> >> barrier just as in wq_has_sleeper(). >>> >> >> >> >>> >> >> >> I found this issue when I was looking through the linux source code >>> >> >> >> for places calling waitqueue_active() before wake_up*(), but without >>> >> >> >> preceding memory barriers, after sending a patch to fix a similar >>> >> >> >> issue in drivers/tty/n_tty.c (Details about the original issue can be >>> >> >> >> found here: https://lkml.org/lkml/2015/9/28/849). >>> >> >> > >>> >> >> > hi, >>> >> >> > this feels like the wrong approach to the problem. It requires extra >>> >> >> > 'smb_mb's to be spread around which are hard to understand as easy to >>> >> >> > forget. >>> >> >> > >>> >> >> > A quick look seems to suggest that (nearly) every waitqueue_active() >>> >> >> > will need an smb_mb. Could we just put the smb_mb() inside >>> >> >> > waitqueue_active()?? >>> >> >> <snip> >>> >> >> >>> >> >> There are around 200 occurrences of waitqueue_active() in the kernel >>> >> >> source, and most of the places which use it before wake_up are either >>> >> >> protected by some spin lock, or already has a memory barrier or some >>> >> >> kind of atomic operation before it. >>> >> >> >>> >> >> Simply adding smp_mb() to waitqueue_active() would incur extra cost in >>> >> >> many cases and won't be a good idea. >>> >> >> >>> >> >> Another way to solve this problem is to remove the waitqueue_active(), >>> >> >> making the code look like this; >>> >> >> if (wq) >>> >> >> wake_up_interruptible(wq); >>> >> >> This also fixes the problem because the spinlock in the wake_up*() acts >>> >> >> as a memory barrier and prevents the code from being reordered by the >>> >> >> CPU (and it also makes the resulting code is much simpler). >>> >> > >>> >> > I might not care which we did, except I don't have the means to test >>> >> > this quickly, and I guess this is some of our most frequently called >>> >> > code. >>> >> > >>> >> > I suppose your patch is the most conservative approach, as the >>> >> > alternative is a spinlock/unlock in wake_up_interruptible, which I >>> >> > assume is necessarily more expensive than an smp_mb(). >>> >> > >>> >> > As far as I can tell it's been this way since forever. (Well, since a >>> >> > 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which >>> >> > removed some spinlocks from the data_ready routines.) >>> >> > >>> >> > I don't understand what the actual race is yet (which code exactly is >>> >> > missing the wakeup in this case? nfsd threads seem to instead get >>> >> > woken up by the wake_up_process() in svc_xprt_do_enqueue().) >>> >> >>> >> Thank you for the reply. I tried looking into this. >>> >> >>> >> The callbacks in net/sunrpc/svcsock.c are set up in svc_tcp_init() and >>> >> svc_udp_init(), which are both called from svc_setup_socket(). >>> >> svc_setup_socket() is called (indirectly) from lockd, nfsd, and nfsv4 >>> >> callback port related code. >>> >> >>> >> Maybe I'm wrong, but there might not be any kernel code that is using >>> >> the socket's wait queue in this case. >>> > >>> > As Trond points out there are probably waiters internal to the >>> > networking code. >>> >>> Trond and Bruce, thank you for the comment. I was able to find the call >>> to the wait function that was called from nfsd. >>> >>> sk_stream_wait_connect() and sk_stream_wait_memory() were called from >>> either do_tcp_sendpages() or tcp_sendmsg() called from within >>> svc_send(). sk_stream_wait_connect() shouldn't be called at this point, >>> because the socket has already been used to receive the rpc request. >>> >>> On the wake_up side, sk_write_space() is called from the following >>> locations. The relevant ones seems to be preceded by atomic_sub or a >>> memory barrier. >>> + ksocknal_write_space [drivers/staging/lustre/lnet/klnds/socklnd/socklnd_lib.c:633] >>> + atm_pop_raw [net/atm/raw.c:40] >>> + sock_setsockopt [net/core/sock.c:740] >>> + sock_wfree [net/core/sock.c:1630] >>> Preceded by atomic_sub in sock_wfree() >>> + ccid3_hc_tx_packet_recv [net/dccp/ccids/ccid3.c:442] >>> + do_tcp_sendpages [net/ipv4/tcp.c:1008] >>> + tcp_sendmsg [net/ipv4/tcp.c:1300] >>> + do_tcp_setsockopt [net/ipv4/tcp.c:2597] >>> + tcp_new_space [net/ipv4/tcp_input.c:4885] >>> Preceded by smp_mb__after_atomic in tcp_check_space() >>> + llc_conn_state_process [net/llc/llc_conn.c:148] >>> + pipe_rcv_status [net/phonet/pep.c:312] >>> + pipe_do_rcv [net/phonet/pep.c:440] >>> + pipe_start_flow_control [net/phonet/pep.c:554] >>> + svc_sock_setbufsize [net/sunrpc/svcsock.c:45] >>> >>> sk_state_change() calls related to TCP/IP were called from the following >>> places. >>> + inet_shutdown [net/ipv4/af_inet.c:825] >>> This shouldn't be called when waiting >>> + tcp_done [net/ipv4/tcp.c:3078] >>> spin_lock*/spin_unlock* is called in lock_timer_base >>> + tcp_fin [net/ipv4/tcp_input.c:4031] >>> atomic_long_sub is called from sk_memory_allocated_sub called within >>> sk_mem_reclaim >>> + tcp_finish_connect [net/ipv4/tcp_input.c:5415] >>> This shoudn't be called when waiting >>> + tcp_rcv_state_process [net/ipv4/tcp_input.c:5807,5880] >>> The socket shouldn't be in TCP_SYN_RECV nor TCP_FIN_WAIT1 states when >>> waiting >>> >>> I think the wait queue won't be used for being woken up by >>> svc_{tcp,udp}_data_ready, because nfsd doesn't read from a socket. >> >> Looking, well, I guess kernel_recvmsg() does read from a socket, but I >> assume calling with MSG_DONTWAIT means that it doesn't block. >> >>> So with the current implementation, it seems there shouldn't be any >>> problems even if the memory barrier is missing. >> >> Thanks for the detailed investigation. >> >> I think it would be worth adding a comment if that might help someone >> having to reinvestigate this again some day. > > It would be nice, but I find it difficult to write a comment in the > sunrpc layer why a memory barrier isn't necessary, using the knowledge > of how nfsd uses it, and the current implementation of the network code. > > Personally, I would prefer removing the call to waitqueue_active() which > would make the memory barrier totally unnecessary at the cost of a > spin_lock + spin_unlock by unconditionally calling > wake_up_interruptible. On second thought, the callbacks will be called frequently from the tcp code, so it wouldn't be a good idea. --- Kosuke TATSUKAWA | 3rd IT Platform Department | IT Platform Division, NEC Corporation | tatsu-zZGIbrA41Td8UrSeD/g0lQ@public.gmane.org -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc 2015-10-15 11:44 ` Kosuke Tatsukawa (?) @ 2015-10-15 20:57 ` J. Bruce Fields 2015-10-16 0:49 ` Neil Brown 2015-10-16 1:46 ` Kosuke Tatsukawa -1 siblings, 2 replies; 28+ messages in thread From: J. Bruce Fields @ 2015-10-15 20:57 UTC (permalink / raw) To: Kosuke Tatsukawa Cc: Trond Myklebust, Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs, netdev, linux-kernel On Thu, Oct 15, 2015 at 11:44:20AM +0000, Kosuke Tatsukawa wrote: > Tatsukawa Kosuke wrote: > > J. Bruce Fields wrote: > >> Thanks for the detailed investigation. > >> > >> I think it would be worth adding a comment if that might help someone > >> having to reinvestigate this again some day. > > > > It would be nice, but I find it difficult to write a comment in the > > sunrpc layer why a memory barrier isn't necessary, using the knowledge > > of how nfsd uses it, and the current implementation of the network code. > > > > Personally, I would prefer removing the call to waitqueue_active() which > > would make the memory barrier totally unnecessary at the cost of a > > spin_lock + spin_unlock by unconditionally calling > > wake_up_interruptible. > > On second thought, the callbacks will be called frequently from the tcp > code, so it wouldn't be a good idea. So, I was even considering documenting it like this, if it's not overkill. Hmm... but if this is right, then we may as well ask why we're doing the wakeups at all. Might be educational to test the code with them removed. --b. commit 0882cfeb39e0 Author: J. Bruce Fields <bfields@redhat.com> Date: Thu Oct 15 16:53:41 2015 -0400 svcrpc: document lack of some memory barriers. Kosuke Tatsukawa points out an odd lack of memory barriers in some sites here. I think the code's correct, but it's probably worth documenting. Reported-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 856407fa085e..90480993ec4a 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -399,6 +399,25 @@ static int svc_sock_secure_port(struct svc_rqst *rqstp) return svc_port_is_privileged(svc_addr(rqstp)); } +static void svc_no_smp_mb(void) +{ + /* + * Kosuke Tatsukawa points out there should normally be an + * smp_mb() at the callsites of this function. (Either that or + * we could just drop the waitqueue_active() checks.) + * + * It appears they aren't currently necessary, though, basically + * because nfsd does non-blocking reads from these sockets, so + * the only places we wait on this waitqueue is in sendpage and + * sendmsg, which won't be waiting for wakeups on newly arrived + * data. + * + * Maybe we should add the memory barriers anyway, but these are + * hot paths so we'd need to be convinced there's no sigificant + * penalty. + */ +} + /* * INET callback when data has been received on the socket. */ @@ -414,7 +433,7 @@ static void svc_udp_data_ready(struct sock *sk) set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); svc_xprt_enqueue(&svsk->sk_xprt); } - smp_mb(); + svc_no_smp_mb(); if (wq && waitqueue_active(wq)) wake_up_interruptible(wq); } @@ -433,7 +452,7 @@ static void svc_write_space(struct sock *sk) svc_xprt_enqueue(&svsk->sk_xprt); } - smp_mb(); + svc_no_smp_mb(); if (wq && waitqueue_active(wq)) { dprintk("RPC svc_write_space: someone sleeping on %p\n", svsk); @@ -789,7 +808,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk) } wq = sk_sleep(sk); - smp_mb(); + svc_no_smp_mb(); if (wq && waitqueue_active(wq)) wake_up_interruptible_all(wq); } @@ -811,7 +830,7 @@ static void svc_tcp_state_change(struct sock *sk) set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags); svc_xprt_enqueue(&svsk->sk_xprt); } - smp_mb(); + svc_no_smp_mb(); if (wq && waitqueue_active(wq)) wake_up_interruptible_all(wq); } @@ -827,7 +846,7 @@ static void svc_tcp_data_ready(struct sock *sk) set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); svc_xprt_enqueue(&svsk->sk_xprt); } - smp_mb(); + svc_no_smp_mb(); if (wq && waitqueue_active(wq)) wake_up_interruptible(wq); } @@ -1599,7 +1618,7 @@ static void svc_sock_detach(struct svc_xprt *xprt) sk->sk_write_space = svsk->sk_owspace; wq = sk_sleep(sk); - smp_mb(); + svc_no_smp_mb(); if (wq && waitqueue_active(wq)) wake_up_interruptible(wq); } ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc 2015-10-15 20:57 ` J. Bruce Fields @ 2015-10-16 0:49 ` Neil Brown 2015-10-16 1:46 ` Kosuke Tatsukawa 1 sibling, 0 replies; 28+ messages in thread From: Neil Brown @ 2015-10-16 0:49 UTC (permalink / raw) To: J. Bruce Fields, Kosuke Tatsukawa Cc: Trond Myklebust, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs, netdev, linux-kernel [-- Attachment #1: Type: text/plain, Size: 4818 bytes --] "J. Bruce Fields" <bfields@fieldses.org> writes: > On Thu, Oct 15, 2015 at 11:44:20AM +0000, Kosuke Tatsukawa wrote: >> Tatsukawa Kosuke wrote: >> > J. Bruce Fields wrote: >> >> Thanks for the detailed investigation. >> >> >> >> I think it would be worth adding a comment if that might help someone >> >> having to reinvestigate this again some day. >> > >> > It would be nice, but I find it difficult to write a comment in the >> > sunrpc layer why a memory barrier isn't necessary, using the knowledge >> > of how nfsd uses it, and the current implementation of the network code. >> > >> > Personally, I would prefer removing the call to waitqueue_active() which >> > would make the memory barrier totally unnecessary at the cost of a >> > spin_lock + spin_unlock by unconditionally calling >> > wake_up_interruptible. >> >> On second thought, the callbacks will be called frequently from the tcp >> code, so it wouldn't be a good idea. > > So, I was even considering documenting it like this, if it's not > overkill. > > Hmm... but if this is right, then we may as well ask why we're doing the > wakeups at all. Might be educational to test the code with them > removed. > > --b. > > commit 0882cfeb39e0 > Author: J. Bruce Fields <bfields@redhat.com> > Date: Thu Oct 15 16:53:41 2015 -0400 > > svcrpc: document lack of some memory barriers. > > Kosuke Tatsukawa points out an odd lack of memory barriers in some sites > here. I think the code's correct, but it's probably worth documenting. > > Reported-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > index 856407fa085e..90480993ec4a 100644 > --- a/net/sunrpc/svcsock.c > +++ b/net/sunrpc/svcsock.c > @@ -399,6 +399,25 @@ static int svc_sock_secure_port(struct svc_rqst *rqstp) > return svc_port_is_privileged(svc_addr(rqstp)); > } > > +static void svc_no_smp_mb(void) > +{ > + /* > + * Kosuke Tatsukawa points out there should normally be an > + * smp_mb() at the callsites of this function. (Either that or > + * we could just drop the waitqueue_active() checks.) > + * > + * It appears they aren't currently necessary, though, basically > + * because nfsd does non-blocking reads from these sockets, so > + * the only places we wait on this waitqueue is in sendpage and > + * sendmsg, which won't be waiting for wakeups on newly arrived > + * data. > + * > + * Maybe we should add the memory barriers anyway, but these are > + * hot paths so we'd need to be convinced there's no sigificant > + * penalty. > + */ > +} > + > /* > * INET callback when data has been received on the socket. > */ > @@ -414,7 +433,7 @@ static void svc_udp_data_ready(struct sock *sk) > set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); > svc_xprt_enqueue(&svsk->sk_xprt); > } > - smp_mb(); > + svc_no_smp_mb(); > if (wq && waitqueue_active(wq)) > wake_up_interruptible(wq); > } > @@ -433,7 +452,7 @@ static void svc_write_space(struct sock *sk) > svc_xprt_enqueue(&svsk->sk_xprt); > } > > - smp_mb(); > + svc_no_smp_mb(); > if (wq && waitqueue_active(wq)) { > dprintk("RPC svc_write_space: someone sleeping on %p\n", > svsk); > @@ -789,7 +808,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk) > } > > wq = sk_sleep(sk); > - smp_mb(); > + svc_no_smp_mb(); > if (wq && waitqueue_active(wq)) > wake_up_interruptible_all(wq); > } > @@ -811,7 +830,7 @@ static void svc_tcp_state_change(struct sock *sk) > set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags); > svc_xprt_enqueue(&svsk->sk_xprt); > } > - smp_mb(); > + svc_no_smp_mb(); > if (wq && waitqueue_active(wq)) > wake_up_interruptible_all(wq); > } > @@ -827,7 +846,7 @@ static void svc_tcp_data_ready(struct sock *sk) > set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); > svc_xprt_enqueue(&svsk->sk_xprt); > } > - smp_mb(); > + svc_no_smp_mb(); > if (wq && waitqueue_active(wq)) > wake_up_interruptible(wq); > } > @@ -1599,7 +1618,7 @@ static void svc_sock_detach(struct svc_xprt *xprt) > sk->sk_write_space = svsk->sk_owspace; > > wq = sk_sleep(sk); > - smp_mb(); > + svc_no_smp_mb(); > if (wq && waitqueue_active(wq)) > wake_up_interruptible(wq); > } I would feel a lot more comfortable if you instead created: static inline bool sunrpc_waitqueue_active(struct wait_queue_head *wq) { if (!wq) return false; /* long comment abot not needing a memory barrier */ return waitqueue_active(wq); } and then replace various "if (wq && waitqueue_active(wq))" calls with if (sunrpc_waitqueue_active(wq))" The comment seems readable and seems to make sense. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-16 1:46 ` Kosuke Tatsukawa 0 siblings, 0 replies; 28+ messages in thread From: Kosuke Tatsukawa @ 2015-10-16 1:46 UTC (permalink / raw) To: J. Bruce Fields Cc: Trond Myklebust, Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs, netdev, linux-kernel J. Bruce Fields wrote: > On Thu, Oct 15, 2015 at 11:44:20AM +0000, Kosuke Tatsukawa wrote: >> Tatsukawa Kosuke wrote: >> > J. Bruce Fields wrote: >> >> Thanks for the detailed investigation. >> >> >> >> I think it would be worth adding a comment if that might help someone >> >> having to reinvestigate this again some day. >> > >> > It would be nice, but I find it difficult to write a comment in the >> > sunrpc layer why a memory barrier isn't necessary, using the knowledge >> > of how nfsd uses it, and the current implementation of the network code. >> > >> > Personally, I would prefer removing the call to waitqueue_active() which >> > would make the memory barrier totally unnecessary at the cost of a >> > spin_lock + spin_unlock by unconditionally calling >> > wake_up_interruptible. >> >> On second thought, the callbacks will be called frequently from the tcp >> code, so it wouldn't be a good idea. > > So, I was even considering documenting it like this, if it's not > overkill. > > Hmm... but if this is right, then we may as well ask why we're doing the > wakeups at all. Might be educational to test the code with them > removed. sk_write_space will be called in sock_wfree() with UDP/IP each time kfree_skb() is called. With TCP/IP, sk_write_space is only called if SOCK_NOSPACE has been set. sk_data_ready will be called in both tcp_rcv_established() for TCP/IP and in sock_queue_rcv_skb() for UDP/IP. The latter lacks a memory barrier with sk_data_ready called right after __skb_queue_tail(). I think this hasn't caused any problems because sk_data_ready wasn't used. > --b. > > commit 0882cfeb39e0 > Author: J. Bruce Fields <bfields@redhat.com> > Date: Thu Oct 15 16:53:41 2015 -0400 > > svcrpc: document lack of some memory barriers. > > Kosuke Tatsukawa points out an odd lack of memory barriers in some sites > here. I think the code's correct, but it's probably worth documenting. > > Reported-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > index 856407fa085e..90480993ec4a 100644 > --- a/net/sunrpc/svcsock.c > +++ b/net/sunrpc/svcsock.c > @@ -399,6 +399,25 @@ static int svc_sock_secure_port(struct svc_rqst *rqstp) > return svc_port_is_privileged(svc_addr(rqstp)); > } > > +static void svc_no_smp_mb(void) > +{ > + /* > + * Kosuke Tatsukawa points out there should normally be an > + * smp_mb() at the callsites of this function. (Either that or > + * we could just drop the waitqueue_active() checks.) > + * > + * It appears they aren't currently necessary, though, basically > + * because nfsd does non-blocking reads from these sockets, so > + * the only places we wait on this waitqueue is in sendpage and > + * sendmsg, which won't be waiting for wakeups on newly arrived > + * data. > + * > + * Maybe we should add the memory barriers anyway, but these are > + * hot paths so we'd need to be convinced there's no sigificant > + * penalty. > + */ > +} > + > /* > * INET callback when data has been received on the socket. > */ > @@ -414,7 +433,7 @@ static void svc_udp_data_ready(struct sock *sk) > set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); > svc_xprt_enqueue(&svsk->sk_xprt); > } > - smp_mb(); > + svc_no_smp_mb(); > if (wq && waitqueue_active(wq)) > wake_up_interruptible(wq); > } > @@ -433,7 +452,7 @@ static void svc_write_space(struct sock *sk) > svc_xprt_enqueue(&svsk->sk_xprt); > } > > - smp_mb(); > + svc_no_smp_mb(); > if (wq && waitqueue_active(wq)) { > dprintk("RPC svc_write_space: someone sleeping on %p\n", > svsk); > @@ -789,7 +808,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk) > } > > wq = sk_sleep(sk); > - smp_mb(); > + svc_no_smp_mb(); > if (wq && waitqueue_active(wq)) > wake_up_interruptible_all(wq); > } > @@ -811,7 +830,7 @@ static void svc_tcp_state_change(struct sock *sk) > set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags); > svc_xprt_enqueue(&svsk->sk_xprt); > } > - smp_mb(); > + svc_no_smp_mb(); > if (wq && waitqueue_active(wq)) > wake_up_interruptible_all(wq); > } > @@ -827,7 +846,7 @@ static void svc_tcp_data_ready(struct sock *sk) > set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); > svc_xprt_enqueue(&svsk->sk_xprt); > } > - smp_mb(); > + svc_no_smp_mb(); > if (wq && waitqueue_active(wq)) > wake_up_interruptible(wq); > } > @@ -1599,7 +1618,7 @@ static void svc_sock_detach(struct svc_xprt *xprt) > sk->sk_write_space = svsk->sk_owspace; > > wq = sk_sleep(sk); > - smp_mb(); > + svc_no_smp_mb(); > if (wq && waitqueue_active(wq)) > wake_up_interruptible(wq); > } --- Kosuke TATSUKAWA | 3rd IT Platform Department | IT Platform Division, NEC Corporation | tatsu@ab.jp.nec.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-16 1:46 ` Kosuke Tatsukawa 0 siblings, 0 replies; 28+ messages in thread From: Kosuke Tatsukawa @ 2015-10-16 1:46 UTC (permalink / raw) To: J. Bruce Fields Cc: Trond Myklebust, Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA J. Bruce Fields wrote: > On Thu, Oct 15, 2015 at 11:44:20AM +0000, Kosuke Tatsukawa wrote: >> Tatsukawa Kosuke wrote: >> > J. Bruce Fields wrote: >> >> Thanks for the detailed investigation. >> >> >> >> I think it would be worth adding a comment if that might help someone >> >> having to reinvestigate this again some day. >> > >> > It would be nice, but I find it difficult to write a comment in the >> > sunrpc layer why a memory barrier isn't necessary, using the knowledge >> > of how nfsd uses it, and the current implementation of the network code. >> > >> > Personally, I would prefer removing the call to waitqueue_active() which >> > would make the memory barrier totally unnecessary at the cost of a >> > spin_lock + spin_unlock by unconditionally calling >> > wake_up_interruptible. >> >> On second thought, the callbacks will be called frequently from the tcp >> code, so it wouldn't be a good idea. > > So, I was even considering documenting it like this, if it's not > overkill. > > Hmm... but if this is right, then we may as well ask why we're doing the > wakeups at all. Might be educational to test the code with them > removed. sk_write_space will be called in sock_wfree() with UDP/IP each time kfree_skb() is called. With TCP/IP, sk_write_space is only called if SOCK_NOSPACE has been set. sk_data_ready will be called in both tcp_rcv_established() for TCP/IP and in sock_queue_rcv_skb() for UDP/IP. The latter lacks a memory barrier with sk_data_ready called right after __skb_queue_tail(). I think this hasn't caused any problems because sk_data_ready wasn't used. > --b. > > commit 0882cfeb39e0 > Author: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > Date: Thu Oct 15 16:53:41 2015 -0400 > > svcrpc: document lack of some memory barriers. > > Kosuke Tatsukawa points out an odd lack of memory barriers in some sites > here. I think the code's correct, but it's probably worth documenting. > > Reported-by: Kosuke Tatsukawa <tatsu-zZGIbrA41Td8UrSeD/g0lQ@public.gmane.org> > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > index 856407fa085e..90480993ec4a 100644 > --- a/net/sunrpc/svcsock.c > +++ b/net/sunrpc/svcsock.c > @@ -399,6 +399,25 @@ static int svc_sock_secure_port(struct svc_rqst *rqstp) > return svc_port_is_privileged(svc_addr(rqstp)); > } > > +static void svc_no_smp_mb(void) > +{ > + /* > + * Kosuke Tatsukawa points out there should normally be an > + * smp_mb() at the callsites of this function. (Either that or > + * we could just drop the waitqueue_active() checks.) > + * > + * It appears they aren't currently necessary, though, basically > + * because nfsd does non-blocking reads from these sockets, so > + * the only places we wait on this waitqueue is in sendpage and > + * sendmsg, which won't be waiting for wakeups on newly arrived > + * data. > + * > + * Maybe we should add the memory barriers anyway, but these are > + * hot paths so we'd need to be convinced there's no sigificant > + * penalty. > + */ > +} > + > /* > * INET callback when data has been received on the socket. > */ > @@ -414,7 +433,7 @@ static void svc_udp_data_ready(struct sock *sk) > set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); > svc_xprt_enqueue(&svsk->sk_xprt); > } > - smp_mb(); > + svc_no_smp_mb(); > if (wq && waitqueue_active(wq)) > wake_up_interruptible(wq); > } > @@ -433,7 +452,7 @@ static void svc_write_space(struct sock *sk) > svc_xprt_enqueue(&svsk->sk_xprt); > } > > - smp_mb(); > + svc_no_smp_mb(); > if (wq && waitqueue_active(wq)) { > dprintk("RPC svc_write_space: someone sleeping on %p\n", > svsk); > @@ -789,7 +808,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk) > } > > wq = sk_sleep(sk); > - smp_mb(); > + svc_no_smp_mb(); > if (wq && waitqueue_active(wq)) > wake_up_interruptible_all(wq); > } > @@ -811,7 +830,7 @@ static void svc_tcp_state_change(struct sock *sk) > set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags); > svc_xprt_enqueue(&svsk->sk_xprt); > } > - smp_mb(); > + svc_no_smp_mb(); > if (wq && waitqueue_active(wq)) > wake_up_interruptible_all(wq); > } > @@ -827,7 +846,7 @@ static void svc_tcp_data_ready(struct sock *sk) > set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); > svc_xprt_enqueue(&svsk->sk_xprt); > } > - smp_mb(); > + svc_no_smp_mb(); > if (wq && waitqueue_active(wq)) > wake_up_interruptible(wq); > } > @@ -1599,7 +1618,7 @@ static void svc_sock_detach(struct svc_xprt *xprt) > sk->sk_write_space = svsk->sk_owspace; > > wq = sk_sleep(sk); > - smp_mb(); > + svc_no_smp_mb(); > if (wq && waitqueue_active(wq)) > wake_up_interruptible(wq); > } --- Kosuke TATSUKAWA | 3rd IT Platform Department | IT Platform Division, NEC Corporation | tatsu-zZGIbrA41Td8UrSeD/g0lQ@public.gmane.org -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-16 2:28 ` Kosuke Tatsukawa 0 siblings, 0 replies; 28+ messages in thread From: Kosuke Tatsukawa @ 2015-10-16 2:28 UTC (permalink / raw) To: J. Bruce Fields Cc: Trond Myklebust, Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs, netdev, linux-kernel Tatsukawa Kosuke wrote: > J. Bruce Fields wrote: >> On Thu, Oct 15, 2015 at 11:44:20AM +0000, Kosuke Tatsukawa wrote: >>> Tatsukawa Kosuke wrote: >>> > J. Bruce Fields wrote: >>> >> Thanks for the detailed investigation. >>> >> >>> >> I think it would be worth adding a comment if that might help someone >>> >> having to reinvestigate this again some day. >>> > >>> > It would be nice, but I find it difficult to write a comment in the >>> > sunrpc layer why a memory barrier isn't necessary, using the knowledge >>> > of how nfsd uses it, and the current implementation of the network code. >>> > >>> > Personally, I would prefer removing the call to waitqueue_active() which >>> > would make the memory barrier totally unnecessary at the cost of a >>> > spin_lock + spin_unlock by unconditionally calling >>> > wake_up_interruptible. >>> >>> On second thought, the callbacks will be called frequently from the tcp >>> code, so it wouldn't be a good idea. >> >> So, I was even considering documenting it like this, if it's not >> overkill. >> >> Hmm... but if this is right, then we may as well ask why we're doing the >> wakeups at all. Might be educational to test the code with them >> removed. > > sk_write_space will be called in sock_wfree() with UDP/IP each time > kfree_skb() is called. With TCP/IP, sk_write_space is only called if > SOCK_NOSPACE has been set. > > sk_data_ready will be called in both tcp_rcv_established() for TCP/IP > and in sock_queue_rcv_skb() for UDP/IP. The latter lacks a memory > barrier with sk_data_ready called right after __skb_queue_tail(). > I think this hasn't caused any problems because sk_data_ready wasn't > used. Actually, svc_udp_data_ready() calls set_bit() which is an atomic operation. So there won't be a problem unless svsk is NULL. >> --b. >> >> commit 0882cfeb39e0 >> Author: J. Bruce Fields <bfields@redhat.com> >> Date: Thu Oct 15 16:53:41 2015 -0400 >> >> svcrpc: document lack of some memory barriers. >> >> Kosuke Tatsukawa points out an odd lack of memory barriers in some sites >> here. I think the code's correct, but it's probably worth documenting. >> >> Reported-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> >> >> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c >> index 856407fa085e..90480993ec4a 100644 >> --- a/net/sunrpc/svcsock.c >> +++ b/net/sunrpc/svcsock.c >> @@ -399,6 +399,25 @@ static int svc_sock_secure_port(struct svc_rqst *rqstp) >> return svc_port_is_privileged(svc_addr(rqstp)); >> } >> >> +static void svc_no_smp_mb(void) >> +{ >> + /* >> + * Kosuke Tatsukawa points out there should normally be an >> + * smp_mb() at the callsites of this function. (Either that or >> + * we could just drop the waitqueue_active() checks.) >> + * >> + * It appears they aren't currently necessary, though, basically >> + * because nfsd does non-blocking reads from these sockets, so >> + * the only places we wait on this waitqueue is in sendpage and >> + * sendmsg, which won't be waiting for wakeups on newly arrived >> + * data. >> + * >> + * Maybe we should add the memory barriers anyway, but these are >> + * hot paths so we'd need to be convinced there's no sigificant >> + * penalty. >> + */ >> +} >> + >> /* >> * INET callback when data has been received on the socket. >> */ >> @@ -414,7 +433,7 @@ static void svc_udp_data_ready(struct sock *sk) >> set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); >> svc_xprt_enqueue(&svsk->sk_xprt); >> } >> - smp_mb(); >> + svc_no_smp_mb(); >> if (wq && waitqueue_active(wq)) >> wake_up_interruptible(wq); >> } >> @@ -433,7 +452,7 @@ static void svc_write_space(struct sock *sk) >> svc_xprt_enqueue(&svsk->sk_xprt); >> } >> >> - smp_mb(); >> + svc_no_smp_mb(); >> if (wq && waitqueue_active(wq)) { >> dprintk("RPC svc_write_space: someone sleeping on %p\n", >> svsk); >> @@ -789,7 +808,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk) >> } >> >> wq = sk_sleep(sk); >> - smp_mb(); >> + svc_no_smp_mb(); >> if (wq && waitqueue_active(wq)) >> wake_up_interruptible_all(wq); >> } >> @@ -811,7 +830,7 @@ static void svc_tcp_state_change(struct sock *sk) >> set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags); >> svc_xprt_enqueue(&svsk->sk_xprt); >> } >> - smp_mb(); >> + svc_no_smp_mb(); >> if (wq && waitqueue_active(wq)) >> wake_up_interruptible_all(wq); >> } >> @@ -827,7 +846,7 @@ static void svc_tcp_data_ready(struct sock *sk) >> set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); >> svc_xprt_enqueue(&svsk->sk_xprt); >> } >> - smp_mb(); >> + svc_no_smp_mb(); >> if (wq && waitqueue_active(wq)) >> wake_up_interruptible(wq); >> } >> @@ -1599,7 +1618,7 @@ static void svc_sock_detach(struct svc_xprt *xprt) >> sk->sk_write_space = svsk->sk_owspace; >> >> wq = sk_sleep(sk); >> - smp_mb(); >> + svc_no_smp_mb(); >> if (wq && waitqueue_active(wq)) >> wake_up_interruptible(wq); >> } --- Kosuke TATSUKAWA | 3rd IT Platform Department | IT Platform Division, NEC Corporation | tatsu@ab.jp.nec.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-16 2:28 ` Kosuke Tatsukawa 0 siblings, 0 replies; 28+ messages in thread From: Kosuke Tatsukawa @ 2015-10-16 2:28 UTC (permalink / raw) To: J. Bruce Fields Cc: Trond Myklebust, Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Tatsukawa Kosuke wrote: > J. Bruce Fields wrote: >> On Thu, Oct 15, 2015 at 11:44:20AM +0000, Kosuke Tatsukawa wrote: >>> Tatsukawa Kosuke wrote: >>> > J. Bruce Fields wrote: >>> >> Thanks for the detailed investigation. >>> >> >>> >> I think it would be worth adding a comment if that might help someone >>> >> having to reinvestigate this again some day. >>> > >>> > It would be nice, but I find it difficult to write a comment in the >>> > sunrpc layer why a memory barrier isn't necessary, using the knowledge >>> > of how nfsd uses it, and the current implementation of the network code. >>> > >>> > Personally, I would prefer removing the call to waitqueue_active() which >>> > would make the memory barrier totally unnecessary at the cost of a >>> > spin_lock + spin_unlock by unconditionally calling >>> > wake_up_interruptible. >>> >>> On second thought, the callbacks will be called frequently from the tcp >>> code, so it wouldn't be a good idea. >> >> So, I was even considering documenting it like this, if it's not >> overkill. >> >> Hmm... but if this is right, then we may as well ask why we're doing the >> wakeups at all. Might be educational to test the code with them >> removed. > > sk_write_space will be called in sock_wfree() with UDP/IP each time > kfree_skb() is called. With TCP/IP, sk_write_space is only called if > SOCK_NOSPACE has been set. > > sk_data_ready will be called in both tcp_rcv_established() for TCP/IP > and in sock_queue_rcv_skb() for UDP/IP. The latter lacks a memory > barrier with sk_data_ready called right after __skb_queue_tail(). > I think this hasn't caused any problems because sk_data_ready wasn't > used. Actually, svc_udp_data_ready() calls set_bit() which is an atomic operation. So there won't be a problem unless svsk is NULL. >> --b. >> >> commit 0882cfeb39e0 >> Author: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> >> Date: Thu Oct 15 16:53:41 2015 -0400 >> >> svcrpc: document lack of some memory barriers. >> >> Kosuke Tatsukawa points out an odd lack of memory barriers in some sites >> here. I think the code's correct, but it's probably worth documenting. >> >> Reported-by: Kosuke Tatsukawa <tatsu-zZGIbrA41Td8UrSeD/g0lQ@public.gmane.org> >> >> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c >> index 856407fa085e..90480993ec4a 100644 >> --- a/net/sunrpc/svcsock.c >> +++ b/net/sunrpc/svcsock.c >> @@ -399,6 +399,25 @@ static int svc_sock_secure_port(struct svc_rqst *rqstp) >> return svc_port_is_privileged(svc_addr(rqstp)); >> } >> >> +static void svc_no_smp_mb(void) >> +{ >> + /* >> + * Kosuke Tatsukawa points out there should normally be an >> + * smp_mb() at the callsites of this function. (Either that or >> + * we could just drop the waitqueue_active() checks.) >> + * >> + * It appears they aren't currently necessary, though, basically >> + * because nfsd does non-blocking reads from these sockets, so >> + * the only places we wait on this waitqueue is in sendpage and >> + * sendmsg, which won't be waiting for wakeups on newly arrived >> + * data. >> + * >> + * Maybe we should add the memory barriers anyway, but these are >> + * hot paths so we'd need to be convinced there's no sigificant >> + * penalty. >> + */ >> +} >> + >> /* >> * INET callback when data has been received on the socket. >> */ >> @@ -414,7 +433,7 @@ static void svc_udp_data_ready(struct sock *sk) >> set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); >> svc_xprt_enqueue(&svsk->sk_xprt); >> } >> - smp_mb(); >> + svc_no_smp_mb(); >> if (wq && waitqueue_active(wq)) >> wake_up_interruptible(wq); >> } >> @@ -433,7 +452,7 @@ static void svc_write_space(struct sock *sk) >> svc_xprt_enqueue(&svsk->sk_xprt); >> } >> >> - smp_mb(); >> + svc_no_smp_mb(); >> if (wq && waitqueue_active(wq)) { >> dprintk("RPC svc_write_space: someone sleeping on %p\n", >> svsk); >> @@ -789,7 +808,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk) >> } >> >> wq = sk_sleep(sk); >> - smp_mb(); >> + svc_no_smp_mb(); >> if (wq && waitqueue_active(wq)) >> wake_up_interruptible_all(wq); >> } >> @@ -811,7 +830,7 @@ static void svc_tcp_state_change(struct sock *sk) >> set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags); >> svc_xprt_enqueue(&svsk->sk_xprt); >> } >> - smp_mb(); >> + svc_no_smp_mb(); >> if (wq && waitqueue_active(wq)) >> wake_up_interruptible_all(wq); >> } >> @@ -827,7 +846,7 @@ static void svc_tcp_data_ready(struct sock *sk) >> set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); >> svc_xprt_enqueue(&svsk->sk_xprt); >> } >> - smp_mb(); >> + svc_no_smp_mb(); >> if (wq && waitqueue_active(wq)) >> wake_up_interruptible(wq); >> } >> @@ -1599,7 +1618,7 @@ static void svc_sock_detach(struct svc_xprt *xprt) >> sk->sk_write_space = svsk->sk_owspace; >> >> wq = sk_sleep(sk); >> - smp_mb(); >> + svc_no_smp_mb(); >> if (wq && waitqueue_active(wq)) >> wake_up_interruptible(wq); >> } --- Kosuke TATSUKAWA | 3rd IT Platform Department | IT Platform Division, NEC Corporation | tatsu-zZGIbrA41Td8UrSeD/g0lQ@public.gmane.org -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc 2015-10-16 2:28 ` Kosuke Tatsukawa (?) @ 2015-10-22 16:31 ` J. Bruce Fields 2015-10-23 4:14 ` Kosuke Tatsukawa -1 siblings, 1 reply; 28+ messages in thread From: J. Bruce Fields @ 2015-10-22 16:31 UTC (permalink / raw) To: Kosuke Tatsukawa Cc: Trond Myklebust, Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs, netdev, linux-kernel On Fri, Oct 16, 2015 at 02:28:10AM +0000, Kosuke Tatsukawa wrote: > Tatsukawa Kosuke wrote: > > J. Bruce Fields wrote: > >> On Thu, Oct 15, 2015 at 11:44:20AM +0000, Kosuke Tatsukawa wrote: > >>> Tatsukawa Kosuke wrote: > >>> > J. Bruce Fields wrote: > >>> >> Thanks for the detailed investigation. > >>> >> > >>> >> I think it would be worth adding a comment if that might help someone > >>> >> having to reinvestigate this again some day. > >>> > > >>> > It would be nice, but I find it difficult to write a comment in the > >>> > sunrpc layer why a memory barrier isn't necessary, using the knowledge > >>> > of how nfsd uses it, and the current implementation of the network code. > >>> > > >>> > Personally, I would prefer removing the call to waitqueue_active() which > >>> > would make the memory barrier totally unnecessary at the cost of a > >>> > spin_lock + spin_unlock by unconditionally calling > >>> > wake_up_interruptible. > >>> > >>> On second thought, the callbacks will be called frequently from the tcp > >>> code, so it wouldn't be a good idea. > >> > >> So, I was even considering documenting it like this, if it's not > >> overkill. > >> > >> Hmm... but if this is right, then we may as well ask why we're doing the > >> wakeups at all. Might be educational to test the code with them > >> removed. > > > > sk_write_space will be called in sock_wfree() with UDP/IP each time > > kfree_skb() is called. With TCP/IP, sk_write_space is only called if > > SOCK_NOSPACE has been set. > > > > sk_data_ready will be called in both tcp_rcv_established() for TCP/IP > > and in sock_queue_rcv_skb() for UDP/IP. The latter lacks a memory > > barrier with sk_data_ready called right after __skb_queue_tail(). > > I think this hasn't caused any problems because sk_data_ready wasn't > > used. > > Actually, svc_udp_data_ready() calls set_bit() which is an atomic > operation. So there won't be a problem unless svsk is NULL. So is it true that every caller of these socket callbacks has adequate memory barriers between the time the change is made visible and the time the callback is called? If so, then there's nothing really specific about nfsd here. In that case maybe it's the networking code that use some documentation, if it doesn't already? (Or maybe common helper functions for this if (waitqueue_active(wq)) wake_up(wq) pattern?) --b. > > > >> --b. > >> > >> commit 0882cfeb39e0 > >> Author: J. Bruce Fields <bfields@redhat.com> > >> Date: Thu Oct 15 16:53:41 2015 -0400 > >> > >> svcrpc: document lack of some memory barriers. > >> > >> Kosuke Tatsukawa points out an odd lack of memory barriers in some sites > >> here. I think the code's correct, but it's probably worth documenting. > >> > >> Reported-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> > >> > >> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > >> index 856407fa085e..90480993ec4a 100644 > >> --- a/net/sunrpc/svcsock.c > >> +++ b/net/sunrpc/svcsock.c > >> @@ -399,6 +399,25 @@ static int svc_sock_secure_port(struct svc_rqst *rqstp) > >> return svc_port_is_privileged(svc_addr(rqstp)); > >> } > >> > >> +static void svc_no_smp_mb(void) > >> +{ > >> + /* > >> + * Kosuke Tatsukawa points out there should normally be an > >> + * smp_mb() at the callsites of this function. (Either that or > >> + * we could just drop the waitqueue_active() checks.) > >> + * > >> + * It appears they aren't currently necessary, though, basically > >> + * because nfsd does non-blocking reads from these sockets, so > >> + * the only places we wait on this waitqueue is in sendpage and > >> + * sendmsg, which won't be waiting for wakeups on newly arrived > >> + * data. > >> + * > >> + * Maybe we should add the memory barriers anyway, but these are > >> + * hot paths so we'd need to be convinced there's no sigificant > >> + * penalty. > >> + */ > >> +} > >> + > >> /* > >> * INET callback when data has been received on the socket. > >> */ > >> @@ -414,7 +433,7 @@ static void svc_udp_data_ready(struct sock *sk) > >> set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); > >> svc_xprt_enqueue(&svsk->sk_xprt); > >> } > >> - smp_mb(); > >> + svc_no_smp_mb(); > >> if (wq && waitqueue_active(wq)) > >> wake_up_interruptible(wq); > >> } > >> @@ -433,7 +452,7 @@ static void svc_write_space(struct sock *sk) > >> svc_xprt_enqueue(&svsk->sk_xprt); > >> } > >> > >> - smp_mb(); > >> + svc_no_smp_mb(); > >> if (wq && waitqueue_active(wq)) { > >> dprintk("RPC svc_write_space: someone sleeping on %p\n", > >> svsk); > >> @@ -789,7 +808,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk) > >> } > >> > >> wq = sk_sleep(sk); > >> - smp_mb(); > >> + svc_no_smp_mb(); > >> if (wq && waitqueue_active(wq)) > >> wake_up_interruptible_all(wq); > >> } > >> @@ -811,7 +830,7 @@ static void svc_tcp_state_change(struct sock *sk) > >> set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags); > >> svc_xprt_enqueue(&svsk->sk_xprt); > >> } > >> - smp_mb(); > >> + svc_no_smp_mb(); > >> if (wq && waitqueue_active(wq)) > >> wake_up_interruptible_all(wq); > >> } > >> @@ -827,7 +846,7 @@ static void svc_tcp_data_ready(struct sock *sk) > >> set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); > >> svc_xprt_enqueue(&svsk->sk_xprt); > >> } > >> - smp_mb(); > >> + svc_no_smp_mb(); > >> if (wq && waitqueue_active(wq)) > >> wake_up_interruptible(wq); > >> } > >> @@ -1599,7 +1618,7 @@ static void svc_sock_detach(struct svc_xprt *xprt) > >> sk->sk_write_space = svsk->sk_owspace; > >> > >> wq = sk_sleep(sk); > >> - smp_mb(); > >> + svc_no_smp_mb(); > >> if (wq && waitqueue_active(wq)) > >> wake_up_interruptible(wq); > >> } > --- > Kosuke TATSUKAWA | 3rd IT Platform Department > | IT Platform Division, NEC Corporation > | tatsu@ab.jp.nec.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-23 4:14 ` Kosuke Tatsukawa 0 siblings, 0 replies; 28+ messages in thread From: Kosuke Tatsukawa @ 2015-10-23 4:14 UTC (permalink / raw) To: J. Bruce Fields Cc: Trond Myklebust, Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs, netdev, linux-kernel J. Bruce Fields wrote: > On Fri, Oct 16, 2015 at 02:28:10AM +0000, Kosuke Tatsukawa wrote: >> Tatsukawa Kosuke wrote: >> > J. Bruce Fields wrote: >> >> On Thu, Oct 15, 2015 at 11:44:20AM +0000, Kosuke Tatsukawa wrote: >> >>> Tatsukawa Kosuke wrote: >> >>> > J. Bruce Fields wrote: >> >>> >> Thanks for the detailed investigation. >> >>> >> >> >>> >> I think it would be worth adding a comment if that might help someone >> >>> >> having to reinvestigate this again some day. >> >>> > >> >>> > It would be nice, but I find it difficult to write a comment in the >> >>> > sunrpc layer why a memory barrier isn't necessary, using the knowledge >> >>> > of how nfsd uses it, and the current implementation of the network code. >> >>> > >> >>> > Personally, I would prefer removing the call to waitqueue_active() which >> >>> > would make the memory barrier totally unnecessary at the cost of a >> >>> > spin_lock + spin_unlock by unconditionally calling >> >>> > wake_up_interruptible. >> >>> >> >>> On second thought, the callbacks will be called frequently from the tcp >> >>> code, so it wouldn't be a good idea. >> >> >> >> So, I was even considering documenting it like this, if it's not >> >> overkill. >> >> >> >> Hmm... but if this is right, then we may as well ask why we're doing the >> >> wakeups at all. Might be educational to test the code with them >> >> removed. >> > >> > sk_write_space will be called in sock_wfree() with UDP/IP each time >> > kfree_skb() is called. With TCP/IP, sk_write_space is only called if >> > SOCK_NOSPACE has been set. >> > >> > sk_data_ready will be called in both tcp_rcv_established() for TCP/IP >> > and in sock_queue_rcv_skb() for UDP/IP. The latter lacks a memory >> > barrier with sk_data_ready called right after __skb_queue_tail(). >> > I think this hasn't caused any problems because sk_data_ready wasn't >> > used. >> >> Actually, svc_udp_data_ready() calls set_bit() which is an atomic >> operation. So there won't be a problem unless svsk is NULL. > > So is it true that every caller of these socket callbacks has adequate > memory barriers between the time the change is made visible and the time > the callback is called? > > If so, then there's nothing really specific about nfsd here. > > In that case maybe it's the networking code that use some documentation, > if it doesn't already? (Or maybe common helper functions for this > > if (waitqueue_active(wq)) > wake_up(wq) > > pattern?) Some of the other places defining these callback functions are using static inline bool wq_has_sleeper(struct socket_wq *wq) defined in include/net/sock.h The comment above the function explains that it was introduced for exactly this purpose. Even thought the argument variable uses the same name "wq", it has a different type from the wq used in svcsock.c (struct socket_wq * vs. wait_queue_head_t *). > --b. > >> >> >> >> --b. >> >> >> >> commit 0882cfeb39e0 >> >> Author: J. Bruce Fields <bfields@redhat.com> >> >> Date: Thu Oct 15 16:53:41 2015 -0400 >> >> >> >> svcrpc: document lack of some memory barriers. >> >> >> >> Kosuke Tatsukawa points out an odd lack of memory barriers in some sites >> >> here. I think the code's correct, but it's probably worth documenting. >> >> >> >> Reported-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> >> >> >> >> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c >> >> index 856407fa085e..90480993ec4a 100644 >> >> --- a/net/sunrpc/svcsock.c >> >> +++ b/net/sunrpc/svcsock.c >> >> @@ -399,6 +399,25 @@ static int svc_sock_secure_port(struct svc_rqst *rqstp) >> >> return svc_port_is_privileged(svc_addr(rqstp)); >> >> } >> >> >> >> +static void svc_no_smp_mb(void) >> >> +{ >> >> + /* >> >> + * Kosuke Tatsukawa points out there should normally be an >> >> + * smp_mb() at the callsites of this function. (Either that or >> >> + * we could just drop the waitqueue_active() checks.) >> >> + * >> >> + * It appears they aren't currently necessary, though, basically >> >> + * because nfsd does non-blocking reads from these sockets, so >> >> + * the only places we wait on this waitqueue is in sendpage and >> >> + * sendmsg, which won't be waiting for wakeups on newly arrived >> >> + * data. >> >> + * >> >> + * Maybe we should add the memory barriers anyway, but these are >> >> + * hot paths so we'd need to be convinced there's no sigificant >> >> + * penalty. >> >> + */ >> >> +} >> >> + >> >> /* >> >> * INET callback when data has been received on the socket. >> >> */ >> >> @@ -414,7 +433,7 @@ static void svc_udp_data_ready(struct sock *sk) >> >> set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); >> >> svc_xprt_enqueue(&svsk->sk_xprt); >> >> } >> >> - smp_mb(); >> >> + svc_no_smp_mb(); >> >> if (wq && waitqueue_active(wq)) >> >> wake_up_interruptible(wq); >> >> } >> >> @@ -433,7 +452,7 @@ static void svc_write_space(struct sock *sk) >> >> svc_xprt_enqueue(&svsk->sk_xprt); >> >> } >> >> >> >> - smp_mb(); >> >> + svc_no_smp_mb(); >> >> if (wq && waitqueue_active(wq)) { >> >> dprintk("RPC svc_write_space: someone sleeping on %p\n", >> >> svsk); >> >> @@ -789,7 +808,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk) >> >> } >> >> >> >> wq = sk_sleep(sk); >> >> - smp_mb(); >> >> + svc_no_smp_mb(); >> >> if (wq && waitqueue_active(wq)) >> >> wake_up_interruptible_all(wq); >> >> } >> >> @@ -811,7 +830,7 @@ static void svc_tcp_state_change(struct sock *sk) >> >> set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags); >> >> svc_xprt_enqueue(&svsk->sk_xprt); >> >> } >> >> - smp_mb(); >> >> + svc_no_smp_mb(); >> >> if (wq && waitqueue_active(wq)) >> >> wake_up_interruptible_all(wq); >> >> } >> >> @@ -827,7 +846,7 @@ static void svc_tcp_data_ready(struct sock *sk) >> >> set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); >> >> svc_xprt_enqueue(&svsk->sk_xprt); >> >> } >> >> - smp_mb(); >> >> + svc_no_smp_mb(); >> >> if (wq && waitqueue_active(wq)) >> >> wake_up_interruptible(wq); >> >> } >> >> @@ -1599,7 +1618,7 @@ static void svc_sock_detach(struct svc_xprt *xprt) >> >> sk->sk_write_space = svsk->sk_owspace; >> >> >> >> wq = sk_sleep(sk); >> >> - smp_mb(); >> >> + svc_no_smp_mb(); >> >> if (wq && waitqueue_active(wq)) >> >> wake_up_interruptible(wq); >> >> } ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-23 4:14 ` Kosuke Tatsukawa 0 siblings, 0 replies; 28+ messages in thread From: Kosuke Tatsukawa @ 2015-10-23 4:14 UTC (permalink / raw) To: J. Bruce Fields Cc: Trond Myklebust, Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA J. Bruce Fields wrote: > On Fri, Oct 16, 2015 at 02:28:10AM +0000, Kosuke Tatsukawa wrote: >> Tatsukawa Kosuke wrote: >> > J. Bruce Fields wrote: >> >> On Thu, Oct 15, 2015 at 11:44:20AM +0000, Kosuke Tatsukawa wrote: >> >>> Tatsukawa Kosuke wrote: >> >>> > J. Bruce Fields wrote: >> >>> >> Thanks for the detailed investigation. >> >>> >> >> >>> >> I think it would be worth adding a comment if that might help someone >> >>> >> having to reinvestigate this again some day. >> >>> > >> >>> > It would be nice, but I find it difficult to write a comment in the >> >>> > sunrpc layer why a memory barrier isn't necessary, using the knowledge >> >>> > of how nfsd uses it, and the current implementation of the network code. >> >>> > >> >>> > Personally, I would prefer removing the call to waitqueue_active() which >> >>> > would make the memory barrier totally unnecessary at the cost of a >> >>> > spin_lock + spin_unlock by unconditionally calling >> >>> > wake_up_interruptible. >> >>> >> >>> On second thought, the callbacks will be called frequently from the tcp >> >>> code, so it wouldn't be a good idea. >> >> >> >> So, I was even considering documenting it like this, if it's not >> >> overkill. >> >> >> >> Hmm... but if this is right, then we may as well ask why we're doing the >> >> wakeups at all. Might be educational to test the code with them >> >> removed. >> > >> > sk_write_space will be called in sock_wfree() with UDP/IP each time >> > kfree_skb() is called. With TCP/IP, sk_write_space is only called if >> > SOCK_NOSPACE has been set. >> > >> > sk_data_ready will be called in both tcp_rcv_established() for TCP/IP >> > and in sock_queue_rcv_skb() for UDP/IP. The latter lacks a memory >> > barrier with sk_data_ready called right after __skb_queue_tail(). >> > I think this hasn't caused any problems because sk_data_ready wasn't >> > used. >> >> Actually, svc_udp_data_ready() calls set_bit() which is an atomic >> operation. So there won't be a problem unless svsk is NULL. > > So is it true that every caller of these socket callbacks has adequate > memory barriers between the time the change is made visible and the time > the callback is called? > > If so, then there's nothing really specific about nfsd here. > > In that case maybe it's the networking code that use some documentation, > if it doesn't already? (Or maybe common helper functions for this > > if (waitqueue_active(wq)) > wake_up(wq) > > pattern?) Some of the other places defining these callback functions are using static inline bool wq_has_sleeper(struct socket_wq *wq) defined in include/net/sock.h The comment above the function explains that it was introduced for exactly this purpose. Even thought the argument variable uses the same name "wq", it has a different type from the wq used in svcsock.c (struct socket_wq * vs. wait_queue_head_t *). > --b. > >> >> >> >> --b. >> >> >> >> commit 0882cfeb39e0 >> >> Author: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> >> >> Date: Thu Oct 15 16:53:41 2015 -0400 >> >> >> >> svcrpc: document lack of some memory barriers. >> >> >> >> Kosuke Tatsukawa points out an odd lack of memory barriers in some sites >> >> here. I think the code's correct, but it's probably worth documenting. >> >> >> >> Reported-by: Kosuke Tatsukawa <tatsu-zZGIbrA41Td8UrSeD/g0lQ@public.gmane.org> >> >> >> >> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c >> >> index 856407fa085e..90480993ec4a 100644 >> >> --- a/net/sunrpc/svcsock.c >> >> +++ b/net/sunrpc/svcsock.c >> >> @@ -399,6 +399,25 @@ static int svc_sock_secure_port(struct svc_rqst *rqstp) >> >> return svc_port_is_privileged(svc_addr(rqstp)); >> >> } >> >> >> >> +static void svc_no_smp_mb(void) >> >> +{ >> >> + /* >> >> + * Kosuke Tatsukawa points out there should normally be an >> >> + * smp_mb() at the callsites of this function. (Either that or >> >> + * we could just drop the waitqueue_active() checks.) >> >> + * >> >> + * It appears they aren't currently necessary, though, basically >> >> + * because nfsd does non-blocking reads from these sockets, so >> >> + * the only places we wait on this waitqueue is in sendpage and >> >> + * sendmsg, which won't be waiting for wakeups on newly arrived >> >> + * data. >> >> + * >> >> + * Maybe we should add the memory barriers anyway, but these are >> >> + * hot paths so we'd need to be convinced there's no sigificant >> >> + * penalty. >> >> + */ >> >> +} >> >> + >> >> /* >> >> * INET callback when data has been received on the socket. >> >> */ >> >> @@ -414,7 +433,7 @@ static void svc_udp_data_ready(struct sock *sk) >> >> set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); >> >> svc_xprt_enqueue(&svsk->sk_xprt); >> >> } >> >> - smp_mb(); >> >> + svc_no_smp_mb(); >> >> if (wq && waitqueue_active(wq)) >> >> wake_up_interruptible(wq); >> >> } >> >> @@ -433,7 +452,7 @@ static void svc_write_space(struct sock *sk) >> >> svc_xprt_enqueue(&svsk->sk_xprt); >> >> } >> >> >> >> - smp_mb(); >> >> + svc_no_smp_mb(); >> >> if (wq && waitqueue_active(wq)) { >> >> dprintk("RPC svc_write_space: someone sleeping on %p\n", >> >> svsk); >> >> @@ -789,7 +808,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk) >> >> } >> >> >> >> wq = sk_sleep(sk); >> >> - smp_mb(); >> >> + svc_no_smp_mb(); >> >> if (wq && waitqueue_active(wq)) >> >> wake_up_interruptible_all(wq); >> >> } >> >> @@ -811,7 +830,7 @@ static void svc_tcp_state_change(struct sock *sk) >> >> set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags); >> >> svc_xprt_enqueue(&svsk->sk_xprt); >> >> } >> >> - smp_mb(); >> >> + svc_no_smp_mb(); >> >> if (wq && waitqueue_active(wq)) >> >> wake_up_interruptible_all(wq); >> >> } >> >> @@ -827,7 +846,7 @@ static void svc_tcp_data_ready(struct sock *sk) >> >> set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); >> >> svc_xprt_enqueue(&svsk->sk_xprt); >> >> } >> >> - smp_mb(); >> >> + svc_no_smp_mb(); >> >> if (wq && waitqueue_active(wq)) >> >> wake_up_interruptible(wq); >> >> } >> >> @@ -1599,7 +1618,7 @@ static void svc_sock_detach(struct svc_xprt *xprt) >> >> sk->sk_write_space = svsk->sk_owspace; >> >> >> >> wq = sk_sleep(sk); >> >> - smp_mb(); >> >> + svc_no_smp_mb(); >> >> if (wq && waitqueue_active(wq)) >> >> wake_up_interruptible(wq); >> >> } -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-23 20:49 ` J. Bruce Fields 0 siblings, 0 replies; 28+ messages in thread From: J. Bruce Fields @ 2015-10-23 20:49 UTC (permalink / raw) To: Kosuke Tatsukawa Cc: Trond Myklebust, Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs, netdev, linux-kernel On Fri, Oct 23, 2015 at 04:14:10AM +0000, Kosuke Tatsukawa wrote: > J. Bruce Fields wrote: > > On Fri, Oct 16, 2015 at 02:28:10AM +0000, Kosuke Tatsukawa wrote: > >> Tatsukawa Kosuke wrote: > >> > J. Bruce Fields wrote: > >> >> On Thu, Oct 15, 2015 at 11:44:20AM +0000, Kosuke Tatsukawa wrote: > >> >>> Tatsukawa Kosuke wrote: > >> >>> > J. Bruce Fields wrote: > >> >>> >> Thanks for the detailed investigation. > >> >>> >> > >> >>> >> I think it would be worth adding a comment if that might help someone > >> >>> >> having to reinvestigate this again some day. > >> >>> > > >> >>> > It would be nice, but I find it difficult to write a comment in the > >> >>> > sunrpc layer why a memory barrier isn't necessary, using the knowledge > >> >>> > of how nfsd uses it, and the current implementation of the network code. > >> >>> > > >> >>> > Personally, I would prefer removing the call to waitqueue_active() which > >> >>> > would make the memory barrier totally unnecessary at the cost of a > >> >>> > spin_lock + spin_unlock by unconditionally calling > >> >>> > wake_up_interruptible. > >> >>> > >> >>> On second thought, the callbacks will be called frequently from the tcp > >> >>> code, so it wouldn't be a good idea. > >> >> > >> >> So, I was even considering documenting it like this, if it's not > >> >> overkill. > >> >> > >> >> Hmm... but if this is right, then we may as well ask why we're doing the > >> >> wakeups at all. Might be educational to test the code with them > >> >> removed. > >> > > >> > sk_write_space will be called in sock_wfree() with UDP/IP each time > >> > kfree_skb() is called. With TCP/IP, sk_write_space is only called if > >> > SOCK_NOSPACE has been set. > >> > > >> > sk_data_ready will be called in both tcp_rcv_established() for TCP/IP > >> > and in sock_queue_rcv_skb() for UDP/IP. The latter lacks a memory > >> > barrier with sk_data_ready called right after __skb_queue_tail(). > >> > I think this hasn't caused any problems because sk_data_ready wasn't > >> > used. > >> > >> Actually, svc_udp_data_ready() calls set_bit() which is an atomic > >> operation. So there won't be a problem unless svsk is NULL. > > > > So is it true that every caller of these socket callbacks has adequate > > memory barriers between the time the change is made visible and the time > > the callback is called? > > > > If so, then there's nothing really specific about nfsd here. > > > > In that case maybe it's the networking code that use some documentation, > > if it doesn't already? (Or maybe common helper functions for this > > > > if (waitqueue_active(wq)) > > wake_up(wq) > > > > pattern?) > > Some of the other places defining these callback functions are using > static inline bool wq_has_sleeper(struct socket_wq *wq) > defined in include/net/sock.h > > The comment above the function explains that it was introduced for > exactly this purpose. > > Even thought the argument variable uses the same name "wq", it has a > different type from the wq used in svcsock.c (struct socket_wq * > vs. wait_queue_head_t *). OK, thanks. So, I guess it still sounds like the code is OK as is, but maybe my comment wasn't. Here's another attempt. --b. commit b805ca58a81a Author: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> Date: Fri Oct 9 01:44:07 2015 +0000 svcrpc: document lack of some memory barriers We're missing memory barriers in net/sunrpc/svcsock.c in some spots we'd expect them. But it doesn't appear they're necessary in our case, and this is likely a hot path--for now just document the odd behavior. I found this issue when I was looking through the linux source code for places calling waitqueue_active() before wake_up*(), but without preceding memory barriers, after sending a patch to fix a similar issue in drivers/tty/n_tty.c (Details about the original issue can be found here: https://lkml.org/lkml/2015/9/28/849). Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> [bfields@redhat.com,nfbrown@novell.com: document instead of adding barriers] Signed-off-by: J. Bruce Fields <bfields@redhat.com> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 48923730722d..1f71eece04d3 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -399,6 +399,31 @@ static int svc_sock_secure_port(struct svc_rqst *rqstp) return svc_port_is_privileged(svc_addr(rqstp)); } +static bool sunrpc_waitqueue_active(wait_queue_head_t *wq) +{ + if (!wq) + return false; + /* + * Kosuke Tatsukawa points out there should normally be a memory + * barrier here--see wq_has_sleeper(). + * + * It appears that isn't currently necessary, though, basically + * because callers all appear to have sufficient memory barriers + * between the time the relevant change is made and the + * time they call these callbacks. + * + * The nfsd code itself doesn't actually explicitly wait on + * these waitqueues, but it may wait on them for example in + * sendpage() or sendmsg() calls. (And those may be the only + * places, since it it uses nonblocking reads.) + * + * Maybe we should add the memory barriers anyway, but these are + * hot paths so we'd need to be convinced there's no sigificant + * penalty. + */ + return waitqueue_active(wq); +} + /* * INET callback when data has been received on the socket. */ @@ -414,7 +439,7 @@ static void svc_udp_data_ready(struct sock *sk) set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); svc_xprt_enqueue(&svsk->sk_xprt); } - if (wq && waitqueue_active(wq)) + if (sunrpc_waitqueue_active(wq)) wake_up_interruptible(wq); } @@ -432,7 +457,7 @@ static void svc_write_space(struct sock *sk) svc_xprt_enqueue(&svsk->sk_xprt); } - if (wq && waitqueue_active(wq)) { + if (sunrpc_waitqueue_active(wq)) { dprintk("RPC svc_write_space: someone sleeping on %p\n", svsk); wake_up_interruptible(wq); @@ -787,7 +812,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk) } wq = sk_sleep(sk); - if (wq && waitqueue_active(wq)) + if (sunrpc_waitqueue_active(wq)) wake_up_interruptible_all(wq); } @@ -808,7 +833,7 @@ static void svc_tcp_state_change(struct sock *sk) set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags); svc_xprt_enqueue(&svsk->sk_xprt); } - if (wq && waitqueue_active(wq)) + if (sunrpc_waitqueue_active(wq)) wake_up_interruptible_all(wq); } @@ -823,7 +848,7 @@ static void svc_tcp_data_ready(struct sock *sk) set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); svc_xprt_enqueue(&svsk->sk_xprt); } - if (wq && waitqueue_active(wq)) + if (sunrpc_waitqueue_active(wq)) wake_up_interruptible(wq); } @@ -1594,7 +1619,7 @@ static void svc_sock_detach(struct svc_xprt *xprt) sk->sk_write_space = svsk->sk_owspace; wq = sk_sleep(sk); - if (wq && waitqueue_active(wq)) + if (sunrpc_waitqueue_active(wq)) wake_up_interruptible(wq); } ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-23 20:49 ` J. Bruce Fields 0 siblings, 0 replies; 28+ messages in thread From: J. Bruce Fields @ 2015-10-23 20:49 UTC (permalink / raw) To: Kosuke Tatsukawa Cc: Trond Myklebust, Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Fri, Oct 23, 2015 at 04:14:10AM +0000, Kosuke Tatsukawa wrote: > J. Bruce Fields wrote: > > On Fri, Oct 16, 2015 at 02:28:10AM +0000, Kosuke Tatsukawa wrote: > >> Tatsukawa Kosuke wrote: > >> > J. Bruce Fields wrote: > >> >> On Thu, Oct 15, 2015 at 11:44:20AM +0000, Kosuke Tatsukawa wrote: > >> >>> Tatsukawa Kosuke wrote: > >> >>> > J. Bruce Fields wrote: > >> >>> >> Thanks for the detailed investigation. > >> >>> >> > >> >>> >> I think it would be worth adding a comment if that might help someone > >> >>> >> having to reinvestigate this again some day. > >> >>> > > >> >>> > It would be nice, but I find it difficult to write a comment in the > >> >>> > sunrpc layer why a memory barrier isn't necessary, using the knowledge > >> >>> > of how nfsd uses it, and the current implementation of the network code. > >> >>> > > >> >>> > Personally, I would prefer removing the call to waitqueue_active() which > >> >>> > would make the memory barrier totally unnecessary at the cost of a > >> >>> > spin_lock + spin_unlock by unconditionally calling > >> >>> > wake_up_interruptible. > >> >>> > >> >>> On second thought, the callbacks will be called frequently from the tcp > >> >>> code, so it wouldn't be a good idea. > >> >> > >> >> So, I was even considering documenting it like this, if it's not > >> >> overkill. > >> >> > >> >> Hmm... but if this is right, then we may as well ask why we're doing the > >> >> wakeups at all. Might be educational to test the code with them > >> >> removed. > >> > > >> > sk_write_space will be called in sock_wfree() with UDP/IP each time > >> > kfree_skb() is called. With TCP/IP, sk_write_space is only called if > >> > SOCK_NOSPACE has been set. > >> > > >> > sk_data_ready will be called in both tcp_rcv_established() for TCP/IP > >> > and in sock_queue_rcv_skb() for UDP/IP. The latter lacks a memory > >> > barrier with sk_data_ready called right after __skb_queue_tail(). > >> > I think this hasn't caused any problems because sk_data_ready wasn't > >> > used. > >> > >> Actually, svc_udp_data_ready() calls set_bit() which is an atomic > >> operation. So there won't be a problem unless svsk is NULL. > > > > So is it true that every caller of these socket callbacks has adequate > > memory barriers between the time the change is made visible and the time > > the callback is called? > > > > If so, then there's nothing really specific about nfsd here. > > > > In that case maybe it's the networking code that use some documentation, > > if it doesn't already? (Or maybe common helper functions for this > > > > if (waitqueue_active(wq)) > > wake_up(wq) > > > > pattern?) > > Some of the other places defining these callback functions are using > static inline bool wq_has_sleeper(struct socket_wq *wq) > defined in include/net/sock.h > > The comment above the function explains that it was introduced for > exactly this purpose. > > Even thought the argument variable uses the same name "wq", it has a > different type from the wq used in svcsock.c (struct socket_wq * > vs. wait_queue_head_t *). OK, thanks. So, I guess it still sounds like the code is OK as is, but maybe my comment wasn't. Here's another attempt. --b. commit b805ca58a81a Author: Kosuke Tatsukawa <tatsu-zZGIbrA41Td8UrSeD/g0lQ@public.gmane.org> Date: Fri Oct 9 01:44:07 2015 +0000 svcrpc: document lack of some memory barriers We're missing memory barriers in net/sunrpc/svcsock.c in some spots we'd expect them. But it doesn't appear they're necessary in our case, and this is likely a hot path--for now just document the odd behavior. I found this issue when I was looking through the linux source code for places calling waitqueue_active() before wake_up*(), but without preceding memory barriers, after sending a patch to fix a similar issue in drivers/tty/n_tty.c (Details about the original issue can be found here: https://lkml.org/lkml/2015/9/28/849). Signed-off-by: Kosuke Tatsukawa <tatsu-zZGIbrA41Td8UrSeD/g0lQ@public.gmane.org> [bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,nfbrown-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org: document instead of adding barriers] Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 48923730722d..1f71eece04d3 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -399,6 +399,31 @@ static int svc_sock_secure_port(struct svc_rqst *rqstp) return svc_port_is_privileged(svc_addr(rqstp)); } +static bool sunrpc_waitqueue_active(wait_queue_head_t *wq) +{ + if (!wq) + return false; + /* + * Kosuke Tatsukawa points out there should normally be a memory + * barrier here--see wq_has_sleeper(). + * + * It appears that isn't currently necessary, though, basically + * because callers all appear to have sufficient memory barriers + * between the time the relevant change is made and the + * time they call these callbacks. + * + * The nfsd code itself doesn't actually explicitly wait on + * these waitqueues, but it may wait on them for example in + * sendpage() or sendmsg() calls. (And those may be the only + * places, since it it uses nonblocking reads.) + * + * Maybe we should add the memory barriers anyway, but these are + * hot paths so we'd need to be convinced there's no sigificant + * penalty. + */ + return waitqueue_active(wq); +} + /* * INET callback when data has been received on the socket. */ @@ -414,7 +439,7 @@ static void svc_udp_data_ready(struct sock *sk) set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); svc_xprt_enqueue(&svsk->sk_xprt); } - if (wq && waitqueue_active(wq)) + if (sunrpc_waitqueue_active(wq)) wake_up_interruptible(wq); } @@ -432,7 +457,7 @@ static void svc_write_space(struct sock *sk) svc_xprt_enqueue(&svsk->sk_xprt); } - if (wq && waitqueue_active(wq)) { + if (sunrpc_waitqueue_active(wq)) { dprintk("RPC svc_write_space: someone sleeping on %p\n", svsk); wake_up_interruptible(wq); @@ -787,7 +812,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk) } wq = sk_sleep(sk); - if (wq && waitqueue_active(wq)) + if (sunrpc_waitqueue_active(wq)) wake_up_interruptible_all(wq); } @@ -808,7 +833,7 @@ static void svc_tcp_state_change(struct sock *sk) set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags); svc_xprt_enqueue(&svsk->sk_xprt); } - if (wq && waitqueue_active(wq)) + if (sunrpc_waitqueue_active(wq)) wake_up_interruptible_all(wq); } @@ -823,7 +848,7 @@ static void svc_tcp_data_ready(struct sock *sk) set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); svc_xprt_enqueue(&svsk->sk_xprt); } - if (wq && waitqueue_active(wq)) + if (sunrpc_waitqueue_active(wq)) wake_up_interruptible(wq); } @@ -1594,7 +1619,7 @@ static void svc_sock_detach(struct svc_xprt *xprt) sk->sk_write_space = svsk->sk_owspace; wq = sk_sleep(sk); - if (wq && waitqueue_active(wq)) + if (sunrpc_waitqueue_active(wq)) wake_up_interruptible(wq); } -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-24 1:19 ` Kosuke Tatsukawa 0 siblings, 0 replies; 28+ messages in thread From: Kosuke Tatsukawa @ 2015-10-24 1:19 UTC (permalink / raw) To: J. Bruce Fields Cc: Trond Myklebust, Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs, netdev, linux-kernel J. Bruce Fields wrote: > On Fri, Oct 23, 2015 at 04:14:10AM +0000, Kosuke Tatsukawa wrote: >> J. Bruce Fields wrote: >> > On Fri, Oct 16, 2015 at 02:28:10AM +0000, Kosuke Tatsukawa wrote: >> >> Tatsukawa Kosuke wrote: >> >> > J. Bruce Fields wrote: >> >> >> On Thu, Oct 15, 2015 at 11:44:20AM +0000, Kosuke Tatsukawa wrote: >> >> >>> Tatsukawa Kosuke wrote: >> >> >>> > J. Bruce Fields wrote: >> >> >>> >> Thanks for the detailed investigation. >> >> >>> >> >> >> >>> >> I think it would be worth adding a comment if that might help someone >> >> >>> >> having to reinvestigate this again some day. >> >> >>> > >> >> >>> > It would be nice, but I find it difficult to write a comment in the >> >> >>> > sunrpc layer why a memory barrier isn't necessary, using the knowledge >> >> >>> > of how nfsd uses it, and the current implementation of the network code. >> >> >>> > >> >> >>> > Personally, I would prefer removing the call to waitqueue_active() which >> >> >>> > would make the memory barrier totally unnecessary at the cost of a >> >> >>> > spin_lock + spin_unlock by unconditionally calling >> >> >>> > wake_up_interruptible. >> >> >>> >> >> >>> On second thought, the callbacks will be called frequently from the tcp >> >> >>> code, so it wouldn't be a good idea. >> >> >> >> >> >> So, I was even considering documenting it like this, if it's not >> >> >> overkill. >> >> >> >> >> >> Hmm... but if this is right, then we may as well ask why we're doing the >> >> >> wakeups at all. Might be educational to test the code with them >> >> >> removed. >> >> > >> >> > sk_write_space will be called in sock_wfree() with UDP/IP each time >> >> > kfree_skb() is called. With TCP/IP, sk_write_space is only called if >> >> > SOCK_NOSPACE has been set. >> >> > >> >> > sk_data_ready will be called in both tcp_rcv_established() for TCP/IP >> >> > and in sock_queue_rcv_skb() for UDP/IP. The latter lacks a memory >> >> > barrier with sk_data_ready called right after __skb_queue_tail(). >> >> > I think this hasn't caused any problems because sk_data_ready wasn't >> >> > used. >> >> >> >> Actually, svc_udp_data_ready() calls set_bit() which is an atomic >> >> operation. So there won't be a problem unless svsk is NULL. >> > >> > So is it true that every caller of these socket callbacks has adequate >> > memory barriers between the time the change is made visible and the time >> > the callback is called? >> > >> > If so, then there's nothing really specific about nfsd here. >> > >> > In that case maybe it's the networking code that use some documentation, >> > if it doesn't already? (Or maybe common helper functions for this >> > >> > if (waitqueue_active(wq)) >> > wake_up(wq) >> > >> > pattern?) >> >> Some of the other places defining these callback functions are using >> static inline bool wq_has_sleeper(struct socket_wq *wq) >> defined in include/net/sock.h >> >> The comment above the function explains that it was introduced for >> exactly this purpose. >> >> Even thought the argument variable uses the same name "wq", it has a >> different type from the wq used in svcsock.c (struct socket_wq * >> vs. wait_queue_head_t *). > > OK, thanks. So, I guess it still sounds like the code is OK as is, but > maybe my comment wasn't. Here's another attempt. Thank you. By now the patch looks completely different from my original patch, so I don't think I deserve to be mentioned in the Author line. > --b. > > commit b805ca58a81a > Author: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> > Date: Fri Oct 9 01:44:07 2015 +0000 > > svcrpc: document lack of some memory barriers > > We're missing memory barriers in net/sunrpc/svcsock.c in some spots we'd > expect them. But it doesn't appear they're necessary in our case, and > this is likely a hot path--for now just document the odd behavior. > > I found this issue when I was looking through the linux source code > for places calling waitqueue_active() before wake_up*(), but without > preceding memory barriers, after sending a patch to fix a similar > issue in drivers/tty/n_tty.c (Details about the original issue can be > found here: https://lkml.org/lkml/2015/9/28/849). I should have used the stable link format to refer to the disucssion in LKML instead of the lkml.org URL. The stable link is http://lkml.kernel.org/r/17EC94B0A072C34B8DCF0D30AD16044A02871D53@BPXM09GP.gisp.nec.co.jp > > Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> > [bfields@redhat.com,nfbrown@novell.com: document instead of adding barriers] > Signed-off-by: J. Bruce Fields <bfields@redhat.com> > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > index 48923730722d..1f71eece04d3 100644 > --- a/net/sunrpc/svcsock.c > +++ b/net/sunrpc/svcsock.c > @@ -399,6 +399,31 @@ static int svc_sock_secure_port(struct svc_rqst *rqstp) > return svc_port_is_privileged(svc_addr(rqstp)); > } > > +static bool sunrpc_waitqueue_active(wait_queue_head_t *wq) > +{ > + if (!wq) > + return false; > + /* > + * Kosuke Tatsukawa points out there should normally be a memory > + * barrier here--see wq_has_sleeper(). Having my name in the comment itself looks a little odd, since I don't see other places in the kernel source code that mentions the reporter's name. > + * > + * It appears that isn't currently necessary, though, basically > + * because callers all appear to have sufficient memory barriers > + * between the time the relevant change is made and the > + * time they call these callbacks. > + * > + * The nfsd code itself doesn't actually explicitly wait on > + * these waitqueues, but it may wait on them for example in > + * sendpage() or sendmsg() calls. (And those may be the only > + * places, since it it uses nonblocking reads.) The above line contains an extra "it". > + * > + * Maybe we should add the memory barriers anyway, but these are > + * hot paths so we'd need to be convinced there's no sigificant > + * penalty. > + */ > + return waitqueue_active(wq); > +} > + > /* > * INET callback when data has been received on the socket. > */ > @@ -414,7 +439,7 @@ static void svc_udp_data_ready(struct sock *sk) > set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); > svc_xprt_enqueue(&svsk->sk_xprt); > } > - if (wq && waitqueue_active(wq)) > + if (sunrpc_waitqueue_active(wq)) > wake_up_interruptible(wq); > } > > @@ -432,7 +457,7 @@ static void svc_write_space(struct sock *sk) > svc_xprt_enqueue(&svsk->sk_xprt); > } > > - if (wq && waitqueue_active(wq)) { > + if (sunrpc_waitqueue_active(wq)) { > dprintk("RPC svc_write_space: someone sleeping on %p\n", > svsk); > wake_up_interruptible(wq); > @@ -787,7 +812,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk) > } > > wq = sk_sleep(sk); > - if (wq && waitqueue_active(wq)) > + if (sunrpc_waitqueue_active(wq)) > wake_up_interruptible_all(wq); > } > > @@ -808,7 +833,7 @@ static void svc_tcp_state_change(struct sock *sk) > set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags); > svc_xprt_enqueue(&svsk->sk_xprt); > } > - if (wq && waitqueue_active(wq)) > + if (sunrpc_waitqueue_active(wq)) > wake_up_interruptible_all(wq); > } > > @@ -823,7 +848,7 @@ static void svc_tcp_data_ready(struct sock *sk) > set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); > svc_xprt_enqueue(&svsk->sk_xprt); > } > - if (wq && waitqueue_active(wq)) > + if (sunrpc_waitqueue_active(wq)) > wake_up_interruptible(wq); > } > > @@ -1594,7 +1619,7 @@ static void svc_sock_detach(struct svc_xprt *xprt) > sk->sk_write_space = svsk->sk_owspace; > > wq = sk_sleep(sk); > - if (wq && waitqueue_active(wq)) > + if (sunrpc_waitqueue_active(wq)) > wake_up_interruptible(wq); > } Best regards. --- Kosuke TATSUKAWA | 3rd IT Platform Department | IT Platform Division, NEC Corporation | tatsu@ab.jp.nec.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc @ 2015-10-24 1:19 ` Kosuke Tatsukawa 0 siblings, 0 replies; 28+ messages in thread From: Kosuke Tatsukawa @ 2015-10-24 1:19 UTC (permalink / raw) To: J. Bruce Fields Cc: Trond Myklebust, Neil Brown, Anna Schumaker, Jeff Layton, David S. Miller, linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA J. Bruce Fields wrote: > On Fri, Oct 23, 2015 at 04:14:10AM +0000, Kosuke Tatsukawa wrote: >> J. Bruce Fields wrote: >> > On Fri, Oct 16, 2015 at 02:28:10AM +0000, Kosuke Tatsukawa wrote: >> >> Tatsukawa Kosuke wrote: >> >> > J. Bruce Fields wrote: >> >> >> On Thu, Oct 15, 2015 at 11:44:20AM +0000, Kosuke Tatsukawa wrote: >> >> >>> Tatsukawa Kosuke wrote: >> >> >>> > J. Bruce Fields wrote: >> >> >>> >> Thanks for the detailed investigation. >> >> >>> >> >> >> >>> >> I think it would be worth adding a comment if that might help someone >> >> >>> >> having to reinvestigate this again some day. >> >> >>> > >> >> >>> > It would be nice, but I find it difficult to write a comment in the >> >> >>> > sunrpc layer why a memory barrier isn't necessary, using the knowledge >> >> >>> > of how nfsd uses it, and the current implementation of the network code. >> >> >>> > >> >> >>> > Personally, I would prefer removing the call to waitqueue_active() which >> >> >>> > would make the memory barrier totally unnecessary at the cost of a >> >> >>> > spin_lock + spin_unlock by unconditionally calling >> >> >>> > wake_up_interruptible. >> >> >>> >> >> >>> On second thought, the callbacks will be called frequently from the tcp >> >> >>> code, so it wouldn't be a good idea. >> >> >> >> >> >> So, I was even considering documenting it like this, if it's not >> >> >> overkill. >> >> >> >> >> >> Hmm... but if this is right, then we may as well ask why we're doing the >> >> >> wakeups at all. Might be educational to test the code with them >> >> >> removed. >> >> > >> >> > sk_write_space will be called in sock_wfree() with UDP/IP each time >> >> > kfree_skb() is called. With TCP/IP, sk_write_space is only called if >> >> > SOCK_NOSPACE has been set. >> >> > >> >> > sk_data_ready will be called in both tcp_rcv_established() for TCP/IP >> >> > and in sock_queue_rcv_skb() for UDP/IP. The latter lacks a memory >> >> > barrier with sk_data_ready called right after __skb_queue_tail(). >> >> > I think this hasn't caused any problems because sk_data_ready wasn't >> >> > used. >> >> >> >> Actually, svc_udp_data_ready() calls set_bit() which is an atomic >> >> operation. So there won't be a problem unless svsk is NULL. >> > >> > So is it true that every caller of these socket callbacks has adequate >> > memory barriers between the time the change is made visible and the time >> > the callback is called? >> > >> > If so, then there's nothing really specific about nfsd here. >> > >> > In that case maybe it's the networking code that use some documentation, >> > if it doesn't already? (Or maybe common helper functions for this >> > >> > if (waitqueue_active(wq)) >> > wake_up(wq) >> > >> > pattern?) >> >> Some of the other places defining these callback functions are using >> static inline bool wq_has_sleeper(struct socket_wq *wq) >> defined in include/net/sock.h >> >> The comment above the function explains that it was introduced for >> exactly this purpose. >> >> Even thought the argument variable uses the same name "wq", it has a >> different type from the wq used in svcsock.c (struct socket_wq * >> vs. wait_queue_head_t *). > > OK, thanks. So, I guess it still sounds like the code is OK as is, but > maybe my comment wasn't. Here's another attempt. Thank you. By now the patch looks completely different from my original patch, so I don't think I deserve to be mentioned in the Author line. > --b. > > commit b805ca58a81a > Author: Kosuke Tatsukawa <tatsu-zZGIbrA41Td8UrSeD/g0lQ@public.gmane.org> > Date: Fri Oct 9 01:44:07 2015 +0000 > > svcrpc: document lack of some memory barriers > > We're missing memory barriers in net/sunrpc/svcsock.c in some spots we'd > expect them. But it doesn't appear they're necessary in our case, and > this is likely a hot path--for now just document the odd behavior. > > I found this issue when I was looking through the linux source code > for places calling waitqueue_active() before wake_up*(), but without > preceding memory barriers, after sending a patch to fix a similar > issue in drivers/tty/n_tty.c (Details about the original issue can be > found here: https://lkml.org/lkml/2015/9/28/849). I should have used the stable link format to refer to the disucssion in LKML instead of the lkml.org URL. The stable link is http://lkml.kernel.org/r/17EC94B0A072C34B8DCF0D30AD16044A02871D53-9lrffkYxhwTt6d3pZDjeaEtBU8KWyXPq@public.gmane.org > > Signed-off-by: Kosuke Tatsukawa <tatsu-zZGIbrA41Td8UrSeD/g0lQ@public.gmane.org> > [bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,nfbrown-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org: document instead of adding barriers] > Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > index 48923730722d..1f71eece04d3 100644 > --- a/net/sunrpc/svcsock.c > +++ b/net/sunrpc/svcsock.c > @@ -399,6 +399,31 @@ static int svc_sock_secure_port(struct svc_rqst *rqstp) > return svc_port_is_privileged(svc_addr(rqstp)); > } > > +static bool sunrpc_waitqueue_active(wait_queue_head_t *wq) > +{ > + if (!wq) > + return false; > + /* > + * Kosuke Tatsukawa points out there should normally be a memory > + * barrier here--see wq_has_sleeper(). Having my name in the comment itself looks a little odd, since I don't see other places in the kernel source code that mentions the reporter's name. > + * > + * It appears that isn't currently necessary, though, basically > + * because callers all appear to have sufficient memory barriers > + * between the time the relevant change is made and the > + * time they call these callbacks. > + * > + * The nfsd code itself doesn't actually explicitly wait on > + * these waitqueues, but it may wait on them for example in > + * sendpage() or sendmsg() calls. (And those may be the only > + * places, since it it uses nonblocking reads.) The above line contains an extra "it". > + * > + * Maybe we should add the memory barriers anyway, but these are > + * hot paths so we'd need to be convinced there's no sigificant > + * penalty. > + */ > + return waitqueue_active(wq); > +} > + > /* > * INET callback when data has been received on the socket. > */ > @@ -414,7 +439,7 @@ static void svc_udp_data_ready(struct sock *sk) > set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); > svc_xprt_enqueue(&svsk->sk_xprt); > } > - if (wq && waitqueue_active(wq)) > + if (sunrpc_waitqueue_active(wq)) > wake_up_interruptible(wq); > } > > @@ -432,7 +457,7 @@ static void svc_write_space(struct sock *sk) > svc_xprt_enqueue(&svsk->sk_xprt); > } > > - if (wq && waitqueue_active(wq)) { > + if (sunrpc_waitqueue_active(wq)) { > dprintk("RPC svc_write_space: someone sleeping on %p\n", > svsk); > wake_up_interruptible(wq); > @@ -787,7 +812,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk) > } > > wq = sk_sleep(sk); > - if (wq && waitqueue_active(wq)) > + if (sunrpc_waitqueue_active(wq)) > wake_up_interruptible_all(wq); > } > > @@ -808,7 +833,7 @@ static void svc_tcp_state_change(struct sock *sk) > set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags); > svc_xprt_enqueue(&svsk->sk_xprt); > } > - if (wq && waitqueue_active(wq)) > + if (sunrpc_waitqueue_active(wq)) > wake_up_interruptible_all(wq); > } > > @@ -823,7 +848,7 @@ static void svc_tcp_data_ready(struct sock *sk) > set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); > svc_xprt_enqueue(&svsk->sk_xprt); > } > - if (wq && waitqueue_active(wq)) > + if (sunrpc_waitqueue_active(wq)) > wake_up_interruptible(wq); > } > > @@ -1594,7 +1619,7 @@ static void svc_sock_detach(struct svc_xprt *xprt) > sk->sk_write_space = svsk->sk_owspace; > > wq = sk_sleep(sk); > - if (wq && waitqueue_active(wq)) > + if (sunrpc_waitqueue_active(wq)) > wake_up_interruptible(wq); > } Best regards. --- Kosuke TATSUKAWA | 3rd IT Platform Department | IT Platform Division, NEC Corporation | tatsu-zZGIbrA41Td8UrSeD/g0lQ@public.gmane.org -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2015-10-24 1:20 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-10-09 1:44 [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc Kosuke Tatsukawa 2015-10-09 5:56 ` Neil Brown 2015-10-09 6:29 ` Kosuke Tatsukawa 2015-10-09 6:29 ` Kosuke Tatsukawa 2015-10-09 21:18 ` J. Bruce Fields 2015-10-09 21:21 ` Trond Myklebust 2015-10-12 10:41 ` Kosuke Tatsukawa 2015-10-12 10:41 ` Kosuke Tatsukawa 2015-10-12 20:26 ` J. Bruce Fields 2015-10-14 3:57 ` Kosuke Tatsukawa 2015-10-14 16:00 ` J. Bruce Fields 2015-10-14 16:00 ` J. Bruce Fields 2015-10-15 0:09 ` Kosuke Tatsukawa 2015-10-15 11:44 ` Kosuke Tatsukawa 2015-10-15 11:44 ` Kosuke Tatsukawa 2015-10-15 20:57 ` J. Bruce Fields 2015-10-16 0:49 ` Neil Brown 2015-10-16 1:46 ` Kosuke Tatsukawa 2015-10-16 1:46 ` Kosuke Tatsukawa 2015-10-16 2:28 ` Kosuke Tatsukawa 2015-10-16 2:28 ` Kosuke Tatsukawa 2015-10-22 16:31 ` J. Bruce Fields 2015-10-23 4:14 ` Kosuke Tatsukawa 2015-10-23 4:14 ` Kosuke Tatsukawa 2015-10-23 20:49 ` J. Bruce Fields 2015-10-23 20:49 ` J. Bruce Fields 2015-10-24 1:19 ` Kosuke Tatsukawa 2015-10-24 1:19 ` Kosuke Tatsukawa
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.