[RFC][PATCH 0/3] TCP connection repair (v2)

* [RFC][PATCH 0/3] TCP connection repair (v2)
@ 2012-03-06  9:54 Pavel Emelyanov
  2012-03-06  9:55 ` [PATCH 1/3] tcp: Move code around Pavel Emelyanov
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Pavel Emelyanov @ 2012-03-06  9:54 UTC (permalink / raw)
  To: Linux Netdev List, David Miller, Tejun Heo, Eric Dumazet

Hi!

Attempt #2 with transparent TCP connection hijacking.

The idea briefly is -- introduce the "repair" mode of a TCP socket. In this mode
any API call to the socket (bind/connect/sendmsg/etc.) does not result in packets
sent over the network, but instead modifies the socket locally in an expected way.

I.e., the connect() in the repair mode assigns peer's credentials to the sock and 
just turns one into the connected state without issuing SYN-s or whatever. The
bind() call on the socket under repair forcibly binds one to the desired IP and 
port ignoring any (potential) local conflicts (just like if everybody else has the
SO_REUSEADDR set). The sendmsg() just queues data for transmission, etc.

I think, that it makes sense to have this ability in a form of non-obscure API,
since the connection migration can be used not only by checkpoint/restore project,
but also by various load balancing solutions. E.g., a server can accept the 
connection, read the app-level header out of the stream, take the balancing 
decision based on _it_ (rather than just TCP and/or IP header) and then pass 
the existing connection to another host.

Changes since v1:

* Addressed (I hope) David's concern about TCP sequences self-consistency.

The repair mode is turned on only for "static" TCP states, i.e. TCP_CLOSE and
TCP_ESTABLISHED with the socket being locked. Only two sequences can be changed
manually -- the write_seq and the copied_seq -- and only when the socket is in
TCP_CLOSE state. The rest is maintained fully by the kernel code according to
the protocol rules in connect/sendmsg/etc. calls.

* Slight API change.

Instead of two separate sockoptions for send and receive queues sequences I
introduce the option which sets which queue is under repair right now and
the other option for setting the sequence (as described -- works only for
TCP_CLOSE state) of the queue under repair. Yes, there still two options for
this, but such approach helps with socket queues repair (see below).

* Added support for queues repair.

According to the overall idea of the "repair" mode the recv/send syscalls
are used for this and what they do is just peek/poke data from/to queues.
The queue-under-repair set by the described option makes it possible to read
from the send queue and write to the receive one. Caller is obliged to use
the MSG_PEEK flag for recv in the repair mode.

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 12+ messages in thread