All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Jason Baron <jbaron@akamai.com>
Cc: Oleg Nesterov <oleg@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Herbert Xu <herbert.xu@redhat.com>,
	Yauheni Kaliuta <yauheni.kaliuta@redhat.com>,
	netdev@vger.kernel.org
Subject: Re: wrong smp_mb__after_atomic() in tcp_check_space() ?
Date: Mon, 23 Jan 2017 10:04:31 -0800	[thread overview]
Message-ID: <1485194671.16328.195.camel@edumazet-glaptop3.roam.corp.google.com> (raw)
In-Reply-To: <f5f9c542-2b23-9df9-098c-ad4007bf7e78@akamai.com>

On Mon, 2017-01-23 at 11:56 -0500, Jason Baron wrote:
> On 01/23/2017 09:30 AM, Oleg Nesterov wrote:
> > Hello,
> >
> > smp_mb__after_atomic() looks wrong and misleading, sock_reset_flag() does the
> > non-atomic __clear_bit() and thus it can not guarantee test_bit(SOCK_NOSPACE)
> > (non-atomic too) won't be reordered.
> >
> 
> Indeed. Here's a bit of discussion on it:
> http://marc.info/?l=linux-netdev&m=146662325920596&w=2
> 
> > It was added by 3c7151275c0c9a "tcp: add memory barriers to write space paths"
> > and the patch looks correct in that we need the barriers in tcp_check_space()
> > and tcp_poll() in theory, so it seems tcp_check_space() needs smp_mb() ?
> >
> 
> Yes, I think it should be upgraded to an smp_mb() there. If you agree 
> with this analysis, I will send a patch to upgrade it. Note, I did not 
> actually run into this race in practice.

SOCK_QUEUE_SHRUNK is used locally in TCP, it is not used by tcp_poll().

(Otherwise it would be using atomic set/clear operations)

I do not see obvious reason why we have this smp_mb__after_atomic() in
tcp_check_space().


But looking at this code, it seems we lack one barrier if sk_sndbuf is
ever increased. Fortunately this almost never happen during TCP session
lifetime...

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index bfa165cc455ad0a9aea44964aa663dbe6085aebd..3692e9f4c852cebf8c4d46c141f112e75e4ae66d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -331,8 +331,13 @@ static void tcp_sndbuf_expand(struct sock *sk)
 	sndmem = ca_ops->sndbuf_expand ? ca_ops->sndbuf_expand(sk) : 2;
 	sndmem *= nr_segs * per_mss;
 
-	if (sk->sk_sndbuf < sndmem)
+	if (sk->sk_sndbuf < sndmem) {
 		sk->sk_sndbuf = min(sndmem, sysctl_tcp_wmem[2]);
+		/* Paired with second sk_stream_is_writeable(sk)
+		 * test from tcp_poll()
+		 */
+		smp_wmb();
+	}
 }
 
 /* 2. Tuning advertised window (window_clamp, rcv_ssthresh)

  parent reply	other threads:[~2017-01-23 18:04 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-23 14:30 wrong smp_mb__after_atomic() in tcp_check_space() ? Oleg Nesterov
2017-01-23 16:56 ` Jason Baron
2017-01-23 17:18   ` Oleg Nesterov
2017-01-23 18:04   ` Eric Dumazet [this message]
2017-01-23 18:45     ` Jason Baron
2017-01-24  9:18     ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1485194671.16328.195.camel@edumazet-glaptop3.roam.corp.google.com \
    --to=eric.dumazet@gmail.com \
    --cc=davem@davemloft.net \
    --cc=herbert.xu@redhat.com \
    --cc=jbaron@akamai.com \
    --cc=netdev@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=yauheni.kaliuta@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.