All of lore.kernel.org
 help / color / mirror / Atom feed
* wrong smp_mb__after_atomic() in tcp_check_space() ?
@ 2017-01-23 14:30 Oleg Nesterov
  2017-01-23 16:56 ` Jason Baron
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2017-01-23 14:30 UTC (permalink / raw)
  To: Jason Baron, David S. Miller; +Cc: Herbert Xu, Yauheni Kaliuta, netdev

Hello,

smp_mb__after_atomic() looks wrong and misleading, sock_reset_flag() does the
non-atomic __clear_bit() and thus it can not guarantee test_bit(SOCK_NOSPACE)
(non-atomic too) won't be reordered.

It was added by 3c7151275c0c9a "tcp: add memory barriers to write space paths"
and the patch looks correct in that we need the barriers in tcp_check_space()
and tcp_poll() in theory, so it seems tcp_check_space() needs smp_mb() ?

Oleg.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wrong smp_mb__after_atomic() in tcp_check_space() ?
  2017-01-23 14:30 wrong smp_mb__after_atomic() in tcp_check_space() ? Oleg Nesterov
@ 2017-01-23 16:56 ` Jason Baron
  2017-01-23 17:18   ` Oleg Nesterov
  2017-01-23 18:04   ` Eric Dumazet
  0 siblings, 2 replies; 6+ messages in thread
From: Jason Baron @ 2017-01-23 16:56 UTC (permalink / raw)
  To: Oleg Nesterov, David S. Miller
  Cc: Herbert Xu, Yauheni Kaliuta, netdev, Eric Dumazet

On 01/23/2017 09:30 AM, Oleg Nesterov wrote:
> Hello,
>
> smp_mb__after_atomic() looks wrong and misleading, sock_reset_flag() does the
> non-atomic __clear_bit() and thus it can not guarantee test_bit(SOCK_NOSPACE)
> (non-atomic too) won't be reordered.
>

Indeed. Here's a bit of discussion on it:
http://marc.info/?l=linux-netdev&m=146662325920596&w=2

> It was added by 3c7151275c0c9a "tcp: add memory barriers to write space paths"
> and the patch looks correct in that we need the barriers in tcp_check_space()
> and tcp_poll() in theory, so it seems tcp_check_space() needs smp_mb() ?
>

Yes, I think it should be upgraded to an smp_mb() there. If you agree 
with this analysis, I will send a patch to upgrade it. Note, I did not 
actually run into this race in practice.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wrong smp_mb__after_atomic() in tcp_check_space() ?
  2017-01-23 16:56 ` Jason Baron
@ 2017-01-23 17:18   ` Oleg Nesterov
  2017-01-23 18:04   ` Eric Dumazet
  1 sibling, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2017-01-23 17:18 UTC (permalink / raw)
  To: Jason Baron
  Cc: David S. Miller, Herbert Xu, Yauheni Kaliuta, netdev, Eric Dumazet

On 01/23, Jason Baron wrote:
>
> >It was added by 3c7151275c0c9a "tcp: add memory barriers to write space paths"
> >and the patch looks correct in that we need the barriers in tcp_check_space()
> >and tcp_poll() in theory, so it seems tcp_check_space() needs smp_mb() ?
> >
>
> Yes, I think it should be upgraded to an smp_mb() there. If you agree with
> this analysis, I will send a patch to upgrade it.

I think this makes sense, if nothing else to remove the obviously wrong
smp_mb__after_atomic() ;)

> Note, I did not actually
> run into this race in practice.

Yes, I am not sure it is actually necessary in practice, and even if
tcp_check_space() races with tcp_poll() and misses SOCK_NOSPACE it should
be probably called again later, but I know nothing about networking code.

We noticed this by accident, we have a bug report which really looks as a
missed wakeup in epoll, but this kernel is old and in particular it lacks
the commit 128dd1759 "epoll: prevent missed events on EPOLL_CTL_MOD" which
looks more promising. But we are not sure it can fully explain the problem
we can't reproduce, so we are looking for anything else which may contribute
to the problem.

Oleg.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wrong smp_mb__after_atomic() in tcp_check_space() ?
  2017-01-23 16:56 ` Jason Baron
  2017-01-23 17:18   ` Oleg Nesterov
@ 2017-01-23 18:04   ` Eric Dumazet
  2017-01-23 18:45     ` Jason Baron
  2017-01-24  9:18     ` Oleg Nesterov
  1 sibling, 2 replies; 6+ messages in thread
From: Eric Dumazet @ 2017-01-23 18:04 UTC (permalink / raw)
  To: Jason Baron
  Cc: Oleg Nesterov, David S. Miller, Herbert Xu, Yauheni Kaliuta, netdev

On Mon, 2017-01-23 at 11:56 -0500, Jason Baron wrote:
> On 01/23/2017 09:30 AM, Oleg Nesterov wrote:
> > Hello,
> >
> > smp_mb__after_atomic() looks wrong and misleading, sock_reset_flag() does the
> > non-atomic __clear_bit() and thus it can not guarantee test_bit(SOCK_NOSPACE)
> > (non-atomic too) won't be reordered.
> >
> 
> Indeed. Here's a bit of discussion on it:
> http://marc.info/?l=linux-netdev&m=146662325920596&w=2
> 
> > It was added by 3c7151275c0c9a "tcp: add memory barriers to write space paths"
> > and the patch looks correct in that we need the barriers in tcp_check_space()
> > and tcp_poll() in theory, so it seems tcp_check_space() needs smp_mb() ?
> >
> 
> Yes, I think it should be upgraded to an smp_mb() there. If you agree 
> with this analysis, I will send a patch to upgrade it. Note, I did not 
> actually run into this race in practice.

SOCK_QUEUE_SHRUNK is used locally in TCP, it is not used by tcp_poll().

(Otherwise it would be using atomic set/clear operations)

I do not see obvious reason why we have this smp_mb__after_atomic() in
tcp_check_space().


But looking at this code, it seems we lack one barrier if sk_sndbuf is
ever increased. Fortunately this almost never happen during TCP session
lifetime...

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index bfa165cc455ad0a9aea44964aa663dbe6085aebd..3692e9f4c852cebf8c4d46c141f112e75e4ae66d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -331,8 +331,13 @@ static void tcp_sndbuf_expand(struct sock *sk)
 	sndmem = ca_ops->sndbuf_expand ? ca_ops->sndbuf_expand(sk) : 2;
 	sndmem *= nr_segs * per_mss;
 
-	if (sk->sk_sndbuf < sndmem)
+	if (sk->sk_sndbuf < sndmem) {
 		sk->sk_sndbuf = min(sndmem, sysctl_tcp_wmem[2]);
+		/* Paired with second sk_stream_is_writeable(sk)
+		 * test from tcp_poll()
+		 */
+		smp_wmb();
+	}
 }
 
 /* 2. Tuning advertised window (window_clamp, rcv_ssthresh)

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: wrong smp_mb__after_atomic() in tcp_check_space() ?
  2017-01-23 18:04   ` Eric Dumazet
@ 2017-01-23 18:45     ` Jason Baron
  2017-01-24  9:18     ` Oleg Nesterov
  1 sibling, 0 replies; 6+ messages in thread
From: Jason Baron @ 2017-01-23 18:45 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Oleg Nesterov, David S. Miller, Herbert Xu, Yauheni Kaliuta, netdev



On 01/23/2017 01:04 PM, Eric Dumazet wrote:
> On Mon, 2017-01-23 at 11:56 -0500, Jason Baron wrote:
>> On 01/23/2017 09:30 AM, Oleg Nesterov wrote:
>>> Hello,
>>>
>>> smp_mb__after_atomic() looks wrong and misleading, sock_reset_flag() does the
>>> non-atomic __clear_bit() and thus it can not guarantee test_bit(SOCK_NOSPACE)
>>> (non-atomic too) won't be reordered.
>>>
>>
>> Indeed. Here's a bit of discussion on it:
>> http://marc.info/?l=linux-netdev&m=146662325920596&w=2
>>
>>> It was added by 3c7151275c0c9a "tcp: add memory barriers to write space paths"
>>> and the patch looks correct in that we need the barriers in tcp_check_space()
>>> and tcp_poll() in theory, so it seems tcp_check_space() needs smp_mb() ?
>>>
>>
>> Yes, I think it should be upgraded to an smp_mb() there. If you agree
>> with this analysis, I will send a patch to upgrade it. Note, I did not
>> actually run into this race in practice.
>
> SOCK_QUEUE_SHRUNK is used locally in TCP, it is not used by tcp_poll().
>
> (Otherwise it would be using atomic set/clear operations)
>
> I do not see obvious reason why we have this smp_mb__after_atomic() in
> tcp_check_space().
>
>

The idea of the  smp_mb__after_atomic() in tcp_check_space() was to 
ensure that the 'read' of SOCK_NOSPACE there didn't happen before any of 
the 'write' to make sk_stream_is_writeable() true. Otherwise, we could 
miss doing the wakeup from tcp_check_space(). There is probably an 
argument here that there will likely be a subsequent call to 
tcp_check_space() that will see the SOCK_NOSPACE bit set, but in theory 
we could have a small send buffer, or a lot of data could be ack'd in 
one go.

What I missed in the original patch was that sock_reset_flag() isn't an 
atomic operation and thus the smp_mb__after_atomic() is wrong.


> But looking at this code, it seems we lack one barrier if sk_sndbuf is
> ever increased. Fortunately this almost never happen during TCP session
> lifetime...
>

But the wakeup from sk->sk_write_space(sk) will imply a smp_wmb() as per 
the comment in __wake_up() ?

Thanks,

-Jason


> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index bfa165cc455ad0a9aea44964aa663dbe6085aebd..3692e9f4c852cebf8c4d46c141f112e75e4ae66d 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -331,8 +331,13 @@ static void tcp_sndbuf_expand(struct sock *sk)
>  	sndmem = ca_ops->sndbuf_expand ? ca_ops->sndbuf_expand(sk) : 2;
>  	sndmem *= nr_segs * per_mss;
>
> -	if (sk->sk_sndbuf < sndmem)
> +	if (sk->sk_sndbuf < sndmem) {
>  		sk->sk_sndbuf = min(sndmem, sysctl_tcp_wmem[2]);
> +		/* Paired with second sk_stream_is_writeable(sk)
> +		 * test from tcp_poll()
> +		 */
> +		smp_wmb();
> +	}
>  }
>
>  /* 2. Tuning advertised window (window_clamp, rcv_ssthresh)
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wrong smp_mb__after_atomic() in tcp_check_space() ?
  2017-01-23 18:04   ` Eric Dumazet
  2017-01-23 18:45     ` Jason Baron
@ 2017-01-24  9:18     ` Oleg Nesterov
  1 sibling, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2017-01-24  9:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jason Baron, David S. Miller, Herbert Xu, Yauheni Kaliuta, netdev

On 01/23, Eric Dumazet wrote:
>
> On Mon, 2017-01-23 at 11:56 -0500, Jason Baron wrote:
> > On 01/23/2017 09:30 AM, Oleg Nesterov wrote:
> > > Hello,
> > >
> > > smp_mb__after_atomic() looks wrong and misleading, sock_reset_flag() does the
> > > non-atomic __clear_bit() and thus it can not guarantee test_bit(SOCK_NOSPACE)
> > > (non-atomic too) won't be reordered.
> > >
> >
> > Indeed. Here's a bit of discussion on it:
> > http://marc.info/?l=linux-netdev&m=146662325920596&w=2
> >
> > > It was added by 3c7151275c0c9a "tcp: add memory barriers to write space paths"
> > > and the patch looks correct in that we need the barriers in tcp_check_space()
> > > and tcp_poll() in theory, so it seems tcp_check_space() needs smp_mb() ?
> > >
> >
> > Yes, I think it should be upgraded to an smp_mb() there. If you agree 
> > with this analysis, I will send a patch to upgrade it. Note, I did not 
> > actually run into this race in practice.
>
> SOCK_QUEUE_SHRUNK is used locally in TCP, it is not used by tcp_poll().
>
> (Otherwise it would be using atomic set/clear operations)
>
> I do not see obvious reason why we have this smp_mb__after_atomic() in
> tcp_check_space().

It is not that we need to serialize __clear_bit(SOCK_QUEUE_SHRUNK) and
test_bit(SOCK_NOSPACE), we do not care if they are reordered.

But we need to ensure that either tcp_poll() sees sk_stream_is_writeable()
or tcp_check_space() sees SOCK_NOSPACE and calls tcp_new_space().


> But looking at this code, it seems we lack one barrier if sk_sndbuf is
> ever increased. Fortunately this almost never happen during TCP session
> lifetime...
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index bfa165cc455ad0a9aea44964aa663dbe6085aebd..3692e9f4c852cebf8c4d46c141f112e75e4ae66d 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -331,8 +331,13 @@ static void tcp_sndbuf_expand(struct sock *sk)
>  	sndmem = ca_ops->sndbuf_expand ? ca_ops->sndbuf_expand(sk) : 2;
>  	sndmem *= nr_segs * per_mss;
>
> -	if (sk->sk_sndbuf < sndmem)
> +	if (sk->sk_sndbuf < sndmem) {
>  		sk->sk_sndbuf = min(sndmem, sysctl_tcp_wmem[2]);
> +		/* Paired with second sk_stream_is_writeable(sk)
> +		 * test from tcp_poll()
> +		 */
> +		smp_wmb();
> +	}
>  }

I do not think we need the additional barrier here. If we are going to call
sk->sk_write_space() we rely on wq_has_sleeper() which has a barrier which
also pairs with the 2nd check in tcp_poll().

Oleg.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-01-24  9:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-23 14:30 wrong smp_mb__after_atomic() in tcp_check_space() ? Oleg Nesterov
2017-01-23 16:56 ` Jason Baron
2017-01-23 17:18   ` Oleg Nesterov
2017-01-23 18:04   ` Eric Dumazet
2017-01-23 18:45     ` Jason Baron
2017-01-24  9:18     ` Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.