tcp: fix race condition when creating child sockets from syncookies
diff mbox series

Message ID 20201023111352.GA289522@rdias-suse-pc.lan
State New, archived
Headers show
Series
  • tcp: fix race condition when creating child sockets from syncookies
Related show

Commit Message

Ricardo Dias Oct. 23, 2020, 11:13 a.m. UTC
When the TCP stack is in SYN flood mode, the server child socket is
created from the SYN cookie received in a TCP packet with the ACK flag
set.

The child socket is created when the server receives the first TCP
packet with a valid SYN cookie from the client. Usually, this packet
corresponds to the final step of the TCP 3-way handshake, the ACK
packet. But is also possible to receive a valid SYN cookie from the
first TCP data packet sent by the client, and thus create a child socket
from that SYN cookie.

Since a client socket is ready to send data as soon as it receives the
SYN+ACK packet from the server, the client can send the ACK packet (sent
by the TCP stack code), and the first data packet (sent by the userspace
program) almost at the same time, and thus the server will equally
receive the two TCP packets with valid SYN cookies almost at the same
instant.

When such event happens, the TCP stack code has a race condition that
occurs between the momement a lookup is done to the established
connections hashtable to check for the existence of a connection for the
same client, and the moment that the child socket is added to the
established connections hashtable. As a consequence, this race condition
can lead to a situation where we add two child sockets to the
established connections hashtable and deliver two sockets to the
userspace program to the same client.

This patch fixes the race condition by checking if an existing child
socket exists for the same client when we are adding the second child
socket to the established connections socket. If an existing child
socket exists, we return that socket and use it to process the TCP
packet received, and discard the second child socket to the same client.

Signed-off-by: Ricardo Dias <rdias@memsql.com>
---
 include/net/inet_hashtables.h |  1 +
 net/ipv4/inet_hashtables.c    | 59 +++++++++++++++++++++++++++++++++++
 net/ipv4/syncookies.c         |  5 ++-
 net/ipv4/tcp_ipv4.c           | 16 +++++++++-
 4 files changed, 79 insertions(+), 2 deletions(-)

Comments

Eric Dumazet Oct. 23, 2020, 2:03 p.m. UTC | #1
On Fri, Oct 23, 2020 at 1:14 PM Ricardo Dias <rdias@memsql.com> wrote:
>
> When the TCP stack is in SYN flood mode, the server child socket is
> created from the SYN cookie received in a TCP packet with the ACK flag
> set.
>
...

This patch only handles IPv4, unless I am missing something ?

It looks like the fix should be done in inet_ehash_insert(), not
adding yet another helper in TCP.
This would be family generic.

Note that normally, all packets for the same 4-tuple should be handled
by the same cpu,
so this race is quite unlikely to happen in standard setups.
kernel test robot Oct. 23, 2020, 2:25 p.m. UTC | #2
Hi Ricardo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]
[also build test WARNING on net/master linus/master v5.9 next-20201023]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ricardo-Dias/tcp-fix-race-condition-when-creating-child-sockets-from-syncookies/20201023-191433
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 105faa8742437c28815b2a3eb8314ebc5fd9288c
config: arm-randconfig-r023-20201022 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 147b9497e79a98a8614b2b5eb4ba653b44f6b6f0)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # https://github.com/0day-ci/linux/commit/35d7202175fe2c313d66daf214ea113947d78c6d
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ricardo-Dias/tcp-fix-race-condition-when-creating-child-sockets-from-syncookies/20201023-191433
        git checkout 35d7202175fe2c313d66daf214ea113947d78c6d
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> net/ipv4/inet_hashtables.c:572:11: warning: variable 'dif' is uninitialized when used here [-Wuninitialized]
                                         dif, sdif))) {
                                         ^~~
   include/net/inet_hashtables.h:318:33: note: expanded from macro 'INET_MATCH'
            (((__sk)->sk_bound_dev_if == (__dif))          ||              \
                                          ^~~~~
   include/linux/compiler.h:77:40: note: expanded from macro 'likely'
   # define likely(x)      __builtin_expect(!!(x), 1)
                                               ^
   net/ipv4/inet_hashtables.c:553:15: note: initialize the variable 'dif' to silence this warning
           const int dif, sdif = sk->sk_bound_dev_if;
                        ^
                         = 0
   1 warning generated.

vim +/dif +572 net/ipv4/inet_hashtables.c

   539	
   540	/* Inserts a socket into ehash if no existing socket exists for the same
   541	 * quadruple (saddr, sport, daddr, dport).
   542	 * If there is an existing socket, returns that socket, otherwise returns NULL.
   543	 */
   544	struct sock *inet_ehash_insert_chk_dup(struct sock *sk)
   545	{
   546		struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo;
   547		struct hlist_nulls_head *list;
   548		struct inet_ehash_bucket *head;
   549		const struct hlist_nulls_node *node;
   550		struct sock *esk;
   551		spinlock_t *lock; /* protects hashinfo socket entry */
   552		struct net *net = sock_net(sk);
   553		const int dif, sdif = sk->sk_bound_dev_if;
   554	
   555		INET_ADDR_COOKIE(acookie, sk->sk_daddr, sk->sk_rcv_saddr);
   556		const __portpair ports = INET_COMBINED_PORTS(sk->sk_dport, sk->sk_num);
   557	
   558		WARN_ON_ONCE(!sk_unhashed(sk));
   559	
   560		sk->sk_hash = sk_ehashfn(sk);
   561		head = inet_ehash_bucket(hashinfo, sk->sk_hash);
   562		list = &head->chain;
   563		lock = inet_ehash_lockp(hashinfo, sk->sk_hash);
   564	
   565		spin_lock(lock);
   566	begin:
   567		sk_nulls_for_each_rcu(esk, node, list) {
   568			if (esk->sk_hash != sk->sk_hash)
   569				continue;
   570			if (likely(INET_MATCH(esk, net, acookie,
   571					      sk->sk_daddr, sk->sk_rcv_saddr, ports,
 > 572					      dif, sdif))) {
   573				if (unlikely(!refcount_inc_not_zero(&esk->sk_refcnt)))
   574					goto out;
   575				if (unlikely(!INET_MATCH(esk, net, acookie,
   576							 sk->sk_daddr,
   577							 sk->sk_rcv_saddr, ports,
   578							 dif, sdif))) {
   579					sock_gen_put(esk);
   580					goto begin;
   581				}
   582				goto found;
   583			}
   584		}
   585	out:
   586		esk = NULL;
   587		__sk_nulls_add_node_rcu(sk, list);
   588	found:
   589		spin_unlock(lock);
   590		if (esk) {
   591			percpu_counter_inc(sk->sk_prot->orphan_count);
   592			inet_sk_set_state(sk, TCP_CLOSE);
   593			sock_set_flag(sk, SOCK_DEAD);
   594			inet_csk_destroy_sock(sk);
   595		}
   596		return esk;
   597	}
   598	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Ricardo Dias Oct. 23, 2020, 3:51 p.m. UTC | #3
On Fri, Oct 23, 2020 at 04:03:27PM +0200, Eric Dumazet wrote:
> On Fri, Oct 23, 2020 at 1:14 PM Ricardo Dias <rdias@memsql.com> wrote:
> >
> > When the TCP stack is in SYN flood mode, the server child socket is
> > created from the SYN cookie received in a TCP packet with the ACK flag
> > set.
> >
> ...
> 
> This patch only handles IPv4, unless I am missing something ?

Yes, currently the patch only handles IPv4. I'll improve it to also
handle the IPv6 case. 

> 
> It looks like the fix should be done in inet_ehash_insert(), not
> adding yet another helper in TCP.
> This would be family generic.

Ok, sounds good as long as there is not problem in changing the
signature and semantics of the inet_ehash_insert() function, as well as
changing the inet_ehash_nolisten() function.

> 
> Note that normally, all packets for the same 4-tuple should be handled
> by the same cpu,
> so this race is quite unlikely to happen in standard setups.

I was able to write a small client/server program that used the
loopback interface to create connections, which could hit the race
condition in 1/200 runs.

The server when accepts a connection sends an 8 byte identifier to
the client, and then waits for the client to echo the same identifier.
The client creates hundreds of simultaneous connections to the server,
and in each connection it sends one byte as soon as the connection is
established, then reads the 8 byte identifier from the server and sends
it back to the server.

When we hit the race condition, one of the server connections gets an 8
byte identifier different from its own identifier.
Eric Dumazet Oct. 23, 2020, 3:56 p.m. UTC | #4
On Fri, Oct 23, 2020 at 5:51 PM Ricardo Dias <rdias@memsql.com> wrote:
>
> On Fri, Oct 23, 2020 at 04:03:27PM +0200, Eric Dumazet wrote:
> > On Fri, Oct 23, 2020 at 1:14 PM Ricardo Dias <rdias@memsql.com> wrote:
> > >
> > > When the TCP stack is in SYN flood mode, the server child socket is
> > > created from the SYN cookie received in a TCP packet with the ACK flag
> > > set.
> > >
> > ...
> >
> > This patch only handles IPv4, unless I am missing something ?
>
> Yes, currently the patch only handles IPv4. I'll improve it to also
> handle the IPv6 case.
>
> >
> > It looks like the fix should be done in inet_ehash_insert(), not
> > adding yet another helper in TCP.
> > This would be family generic.
>
> Ok, sounds good as long as there is not problem in changing the
> signature and semantics of the inet_ehash_insert() function, as well as
> changing the inet_ehash_nolisten() function.
>
> >
> > Note that normally, all packets for the same 4-tuple should be handled
> > by the same cpu,
> > so this race is quite unlikely to happen in standard setups.
>
> I was able to write a small client/server program that used the
> loopback interface to create connections, which could hit the race
> condition in 1/200 runs.
>
> The server when accepts a connection sends an 8 byte identifier to
> the client, and then waits for the client to echo the same identifier.
> The client creates hundreds of simultaneous connections to the server,
> and in each connection it sends one byte as soon as the connection is
> established, then reads the 8 byte identifier from the server and sends
> it back to the server.
>
> When we hit the race condition, one of the server connections gets an 8
> byte identifier different from its own identifier.

That is on loopback, right ?

A server under syn flood is usually hit on a physical NIC, and a NIC
will always put all packets of a TCP flow in a single RX queue.
The cpu associated with this single RX queue won't process two packets
in parallel.

Note this issue is known, someone tried to fix it in the past but the
attempt went nowhere.
Ricardo Dias Oct. 23, 2020, 4:06 p.m. UTC | #5
On Fri, Oct 23, 2020 at 05:56:07PM +0200, Eric Dumazet wrote:
> On Fri, Oct 23, 2020 at 5:51 PM Ricardo Dias <rdias@memsql.com> wrote:
> >
> > On Fri, Oct 23, 2020 at 04:03:27PM +0200, Eric Dumazet wrote:
> > > On Fri, Oct 23, 2020 at 1:14 PM Ricardo Dias <rdias@memsql.com> wrote:
> > >
> > > ...
> > >
> > > Note that normally, all packets for the same 4-tuple should be handled
> > > by the same cpu,
> > > so this race is quite unlikely to happen in standard setups.
> >
> > I was able to write a small client/server program that used the
> > loopback interface to create connections, which could hit the race
> > condition in 1/200 runs.
> >
> > The server when accepts a connection sends an 8 byte identifier to
> > the client, and then waits for the client to echo the same identifier.
> > The client creates hundreds of simultaneous connections to the server,
> > and in each connection it sends one byte as soon as the connection is
> > established, then reads the 8 byte identifier from the server and sends
> > it back to the server.
> >
> > When we hit the race condition, one of the server connections gets an 8
> > byte identifier different from its own identifier.
> 
> That is on loopback, right ?

Yes it's on loopback.

> 
> A server under syn flood is usually hit on a physical NIC, and a NIC
> will always put all packets of a TCP flow in a single RX queue.
> The cpu associated with this single RX queue won't process two packets
> in parallel.

And what about the loopback interface? Why couldn't the loopback
interface also use a single RX queue?

> 
> Note this issue is known, someone tried to fix it in the past but the
> attempt went nowhere.
Eric Dumazet Oct. 23, 2020, 4:36 p.m. UTC | #6
On Fri, Oct 23, 2020 at 6:06 PM Ricardo Dias <rdias@memsql.com> wrote:

> And what about the loopback interface? Why couldn't the loopback
> interface also use a single RX queue?
>

Loopback is using a per-cpu queue, with no crossing, for efficiency.

That means : whenever a packet is sent on lo interface from CPU X, it
is put on CPU X backlog queue.

If the connect() and sendmsg() are run from different cpus, then the
ACK (from last packet of 3WH) and the data packet might land on
different queues.
Ricardo Dias Oct. 23, 2020, 4:48 p.m. UTC | #7
On Fri, Oct 23, 2020 at 06:36:29PM +0200, Eric Dumazet wrote:
> On Fri, Oct 23, 2020 at 6:06 PM Ricardo Dias <rdias@memsql.com> wrote:
> 
> > And what about the loopback interface? Why couldn't the loopback
> > interface also use a single RX queue?
> >
> 
> Loopback is using a per-cpu queue, with no crossing, for efficiency.
> 
> That means : whenever a packet is sent on lo interface from CPU X, it
> is put on CPU X backlog queue.
> 
> If the connect() and sendmsg() are run from different cpus, then the
> ACK (from last packet of 3WH) and the data packet might land on
> different queues.

In that case, I can change the patch to only iterate the ehash bucket
only when the listening socket is using the loopback interface, correct?
Eric Dumazet Oct. 23, 2020, 4:51 p.m. UTC | #8
On Fri, Oct 23, 2020 at 6:48 PM Ricardo Dias <rdias@memsql.com> wrote:
>

> In that case, I can change the patch to only iterate the ehash bucket
> only when the listening socket is using the loopback interface, correct?

No, the fix should be generic.

We could still have bad NIC, or bad configuration, spreading packets
on random cpus ;)
Ricardo Dias Oct. 23, 2020, 4:54 p.m. UTC | #9
On Fri, Oct 23, 2020 at 06:51:05PM +0200, Eric Dumazet wrote:
> On Fri, Oct 23, 2020 at 6:48 PM Ricardo Dias <rdias@memsql.com> wrote:
> >
> 
> > In that case, I can change the patch to only iterate the ehash bucket
> > only when the listening socket is using the loopback interface, correct?
> 
> No, the fix should be generic.
> 
> We could still have bad NIC, or bad configuration, spreading packets
> on random cpus ;)

Ah, got it!
Dan Carpenter Oct. 26, 2020, 8:27 a.m. UTC | #10
Hi Ricardo,

url:    https://github.com/0day-ci/linux/commits/Ricardo-Dias/tcp-fix-race-condition-when-creating-child-sockets-from-syncookies/20201023-191433
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 105faa8742437c28815b2a3eb8314ebc5fd9288c
config: i386-randconfig-m021-20201022 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
net/ipv4/inet_hashtables.c:570 inet_ehash_insert_chk_dup() error: uninitialized symbol 'dif'.

vim +/dif +570 net/ipv4/inet_hashtables.c

35d7202175fe2c3 Ricardo Dias 2020-10-23  544  struct sock *inet_ehash_insert_chk_dup(struct sock *sk)
35d7202175fe2c3 Ricardo Dias 2020-10-23  545  {
35d7202175fe2c3 Ricardo Dias 2020-10-23  546  	struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo;
35d7202175fe2c3 Ricardo Dias 2020-10-23  547  	struct hlist_nulls_head *list;
35d7202175fe2c3 Ricardo Dias 2020-10-23  548  	struct inet_ehash_bucket *head;
35d7202175fe2c3 Ricardo Dias 2020-10-23  549  	const struct hlist_nulls_node *node;
35d7202175fe2c3 Ricardo Dias 2020-10-23  550  	struct sock *esk;
35d7202175fe2c3 Ricardo Dias 2020-10-23  551  	spinlock_t *lock; /* protects hashinfo socket entry */
35d7202175fe2c3 Ricardo Dias 2020-10-23  552  	struct net *net = sock_net(sk);
35d7202175fe2c3 Ricardo Dias 2020-10-23  553  	const int dif, sdif = sk->sk_bound_dev_if;
                                                      ^^^^^^^
"dif" is never initialized.

35d7202175fe2c3 Ricardo Dias 2020-10-23  554  
35d7202175fe2c3 Ricardo Dias 2020-10-23  555  	INET_ADDR_COOKIE(acookie, sk->sk_daddr, sk->sk_rcv_saddr);
35d7202175fe2c3 Ricardo Dias 2020-10-23  556  	const __portpair ports = INET_COMBINED_PORTS(sk->sk_dport, sk->sk_num);
35d7202175fe2c3 Ricardo Dias 2020-10-23  557  
35d7202175fe2c3 Ricardo Dias 2020-10-23  558  	WARN_ON_ONCE(!sk_unhashed(sk));
35d7202175fe2c3 Ricardo Dias 2020-10-23  559  
35d7202175fe2c3 Ricardo Dias 2020-10-23  560  	sk->sk_hash = sk_ehashfn(sk);
35d7202175fe2c3 Ricardo Dias 2020-10-23  561  	head = inet_ehash_bucket(hashinfo, sk->sk_hash);
35d7202175fe2c3 Ricardo Dias 2020-10-23  562  	list = &head->chain;
35d7202175fe2c3 Ricardo Dias 2020-10-23  563  	lock = inet_ehash_lockp(hashinfo, sk->sk_hash);
35d7202175fe2c3 Ricardo Dias 2020-10-23  564  
35d7202175fe2c3 Ricardo Dias 2020-10-23  565  	spin_lock(lock);
35d7202175fe2c3 Ricardo Dias 2020-10-23  566  begin:
35d7202175fe2c3 Ricardo Dias 2020-10-23  567  	sk_nulls_for_each_rcu(esk, node, list) {
35d7202175fe2c3 Ricardo Dias 2020-10-23  568  		if (esk->sk_hash != sk->sk_hash)
35d7202175fe2c3 Ricardo Dias 2020-10-23  569  			continue;
35d7202175fe2c3 Ricardo Dias 2020-10-23 @570  		if (likely(INET_MATCH(esk, net, acookie,
35d7202175fe2c3 Ricardo Dias 2020-10-23  571  				      sk->sk_daddr, sk->sk_rcv_saddr, ports,
35d7202175fe2c3 Ricardo Dias 2020-10-23  572  				      dif, sdif))) {
                                                                              ^^^
warning.

35d7202175fe2c3 Ricardo Dias 2020-10-23  573  			if (unlikely(!refcount_inc_not_zero(&esk->sk_refcnt)))
35d7202175fe2c3 Ricardo Dias 2020-10-23  574  				goto out;
35d7202175fe2c3 Ricardo Dias 2020-10-23  575  			if (unlikely(!INET_MATCH(esk, net, acookie,
35d7202175fe2c3 Ricardo Dias 2020-10-23  576  						 sk->sk_daddr,
35d7202175fe2c3 Ricardo Dias 2020-10-23  577  						 sk->sk_rcv_saddr, ports,
35d7202175fe2c3 Ricardo Dias 2020-10-23  578  						 dif, sdif))) {
35d7202175fe2c3 Ricardo Dias 2020-10-23  579  				sock_gen_put(esk);
35d7202175fe2c3 Ricardo Dias 2020-10-23  580  				goto begin;
35d7202175fe2c3 Ricardo Dias 2020-10-23  581  			}
35d7202175fe2c3 Ricardo Dias 2020-10-23  582  			goto found;
35d7202175fe2c3 Ricardo Dias 2020-10-23  583  		}
35d7202175fe2c3 Ricardo Dias 2020-10-23  584  	}
35d7202175fe2c3 Ricardo Dias 2020-10-23  585  out:
35d7202175fe2c3 Ricardo Dias 2020-10-23  586  	esk = NULL;
35d7202175fe2c3 Ricardo Dias 2020-10-23  587  	__sk_nulls_add_node_rcu(sk, list);
35d7202175fe2c3 Ricardo Dias 2020-10-23  588  found:
35d7202175fe2c3 Ricardo Dias 2020-10-23  589  	spin_unlock(lock);
35d7202175fe2c3 Ricardo Dias 2020-10-23  590  	if (esk) {
35d7202175fe2c3 Ricardo Dias 2020-10-23  591  		percpu_counter_inc(sk->sk_prot->orphan_count);
35d7202175fe2c3 Ricardo Dias 2020-10-23  592  		inet_sk_set_state(sk, TCP_CLOSE);
35d7202175fe2c3 Ricardo Dias 2020-10-23  593  		sock_set_flag(sk, SOCK_DEAD);
35d7202175fe2c3 Ricardo Dias 2020-10-23  594  		inet_csk_destroy_sock(sk);
35d7202175fe2c3 Ricardo Dias 2020-10-23  595  	}
35d7202175fe2c3 Ricardo Dias 2020-10-23  596  	return esk;
35d7202175fe2c3 Ricardo Dias 2020-10-23  597  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

Patch
diff mbox series

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 92560974ea67..e1d7fd20468a 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -248,6 +248,7 @@  void inet_hashinfo2_init(struct inet_hashinfo *h, const char *name,
 int inet_hashinfo2_init_mod(struct inet_hashinfo *h);
 
 bool inet_ehash_insert(struct sock *sk, struct sock *osk);
+struct sock *inet_ehash_insert_chk_dup(struct sock *sk);
 bool inet_ehash_nolisten(struct sock *sk, struct sock *osk);
 int __inet_hash(struct sock *sk, struct sock *osk);
 int inet_hash(struct sock *sk);
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 239e54474b65..5dbe3aa291e6 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -539,6 +539,65 @@  bool inet_ehash_insert(struct sock *sk, struct sock *osk)
 	return ret;
 }
 
+/* Inserts a socket into ehash if no existing socket exists for the same
+ * quadruple (saddr, sport, daddr, dport).
+ * If there is an existing socket, returns that socket, otherwise returns NULL.
+ */
+struct sock *inet_ehash_insert_chk_dup(struct sock *sk)
+{
+	struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo;
+	struct hlist_nulls_head *list;
+	struct inet_ehash_bucket *head;
+	const struct hlist_nulls_node *node;
+	struct sock *esk;
+	spinlock_t *lock; /* protects hashinfo socket entry */
+	struct net *net = sock_net(sk);
+	const int dif, sdif = sk->sk_bound_dev_if;
+
+	INET_ADDR_COOKIE(acookie, sk->sk_daddr, sk->sk_rcv_saddr);
+	const __portpair ports = INET_COMBINED_PORTS(sk->sk_dport, sk->sk_num);
+
+	WARN_ON_ONCE(!sk_unhashed(sk));
+
+	sk->sk_hash = sk_ehashfn(sk);
+	head = inet_ehash_bucket(hashinfo, sk->sk_hash);
+	list = &head->chain;
+	lock = inet_ehash_lockp(hashinfo, sk->sk_hash);
+
+	spin_lock(lock);
+begin:
+	sk_nulls_for_each_rcu(esk, node, list) {
+		if (esk->sk_hash != sk->sk_hash)
+			continue;
+		if (likely(INET_MATCH(esk, net, acookie,
+				      sk->sk_daddr, sk->sk_rcv_saddr, ports,
+				      dif, sdif))) {
+			if (unlikely(!refcount_inc_not_zero(&esk->sk_refcnt)))
+				goto out;
+			if (unlikely(!INET_MATCH(esk, net, acookie,
+						 sk->sk_daddr,
+						 sk->sk_rcv_saddr, ports,
+						 dif, sdif))) {
+				sock_gen_put(esk);
+				goto begin;
+			}
+			goto found;
+		}
+	}
+out:
+	esk = NULL;
+	__sk_nulls_add_node_rcu(sk, list);
+found:
+	spin_unlock(lock);
+	if (esk) {
+		percpu_counter_inc(sk->sk_prot->orphan_count);
+		inet_sk_set_state(sk, TCP_CLOSE);
+		sock_set_flag(sk, SOCK_DEAD);
+		inet_csk_destroy_sock(sk);
+	}
+	return esk;
+}
+
 bool inet_ehash_nolisten(struct sock *sk, struct sock *osk)
 {
 	bool ok = inet_ehash_insert(sk, osk);
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index e03756631541..c4bb895085f0 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -208,7 +208,7 @@  struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb,
 
 	child = icsk->icsk_af_ops->syn_recv_sock(sk, skb, req, dst,
 						 NULL, &own_req);
-	if (child) {
+	if (child && own_req) {
 		refcount_set(&req->rsk_refcnt, 1);
 		tcp_sk(child)->tsoffset = tsoff;
 		sock_rps_save_rxhash(child, skb);
@@ -223,6 +223,9 @@  struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb,
 
 		bh_unlock_sock(child);
 		sock_put(child);
+	}  else if (child && !own_req) {
+		__reqsk_free(req);
+		return child;
 	}
 	__reqsk_free(req);
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 592c73962723..c705d335bd80 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1501,6 +1501,8 @@  struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb,
 	int l3index;
 #endif
 	struct ip_options_rcu *inet_opt;
+	struct sock *esk = NULL;
+	bool syncookie = false;
 
 	if (sk_acceptq_is_full(sk))
 		goto exit_overflow;
@@ -1535,6 +1537,7 @@  struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb,
 			goto put_and_exit;
 	} else {
 		/* syncookie case : see end of cookie_v4_check() */
+		syncookie = true;
 	}
 	sk_setup_caps(newsk, dst);
 
@@ -1565,7 +1568,18 @@  struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb,
 
 	if (__inet_inherit_port(sk, newsk) < 0)
 		goto put_and_exit;
-	*own_req = inet_ehash_nolisten(newsk, req_to_sk(req_unhash));
+	if (!syncookie) {
+		*own_req = inet_ehash_nolisten(newsk, req_to_sk(req_unhash));
+	} else {
+		esk = inet_ehash_insert_chk_dup(newsk);
+		/* We're going to notify tcp_get_cookie_sock through own_req
+		 * that an existing socket is going to be returned instead,
+		 * since own_req is not used by the syncookies code.
+		 */
+		*own_req = !esk;
+		if (esk)
+			newsk = esk;
+	}
 	if (likely(*own_req)) {
 		tcp_move_syn(newtp, req);
 		ireq->ireq_opt = NULL;